Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot connect to server for downloading video files from current development installs #3704

Closed
EdDixon opened this issue May 22, 2015 · 24 comments
Assignees
Milestone

Comments

@EdDixon
Copy link
Contributor

EdDixon commented May 22, 2015

While downloading videos from the current development installations we are not able to connect with the server in order to do so? The screen shot provided is from a MAC current in development installer installation but with no modification to the local_settings.py which was double checked to ensure there was no redirection to the video resources
2015-05-22_15-17-46
.

@EdDixon EdDixon changed the title Error when downloading video files from current development installs Cannot connet to server for downloading video files from current development installs May 22, 2015
@EdDixon EdDixon changed the title Cannot connet to server for downloading video files from current development installs Cannot connect to server for downloading video files from current development installs May 22, 2015
@EdDixon
Copy link
Contributor Author

EdDixon commented May 24, 2015

Just a note here this test was performed immediately after installation restarting the server seems to have resolved this problem. I am rebooting now to see if the problem returns or remains resolved.

@EdDixon
Copy link
Contributor Author

EdDixon commented May 24, 2015

A reboot did not resolve the issue and to the user it appears as if the server is down or the url is not resolving to the appropriate
2015-05-24_05-31-58
server?
I did not mean to close this issue!

@EdDixon EdDixon closed this as completed May 24, 2015
@EdDixon EdDixon reopened this May 24, 2015
@cpauya cpauya added the Mac label May 25, 2015
@MCGallaspy MCGallaspy added this to the 0.14.0 milestone May 26, 2015
@MCGallaspy MCGallaspy self-assigned this Jun 2, 2015
@MCGallaspy
Copy link
Contributor

Does this issue only occur on mac? Have you been able to reproduce it @ed? Were you logged in as a coach or an admin?

@MCGallaspy
Copy link
Contributor

Unable to reproduce, @cpauya can you try this on mac?

@cpauya
Copy link
Contributor

cpauya commented Jun 3, 2015

I was able to reproduce this a few days ago @MCGallaspy - I will try again with the latest build from dungeon server: http://dungeon.learningequality.org:8085/browse/KL-KALITE014/latest and post a screenshot here.

@cpauya
Copy link
Contributor

cpauya commented Jun 3, 2015

Yes @MCGallaspy, video downloads are failing on the latest Mac installer too (after successfully doing a One-click Registration). I am logged-in as admin.

Here's the browser screenshot on Firefox v38.0.5:
screenshot 2015-06-03 14 34 38

After the failed attempt to download a video, I ran kalite status on the terminal and got an "Unclean shutdown (7)" message.

cyril-lappy6:~ cyril$ kalite status
Unclean shutdown (7)

There are also no logs because it crashed on the pyrun2.7 binary, meaning the ka-lite server has stopped. Here's the first few lines of the crash report:

Process:               pyrun2.7 [17404]
Path:                  /Applications/KA-Lite Monitor.app/Contents/Resources/pyrun-2.7/bin/pyrun2.7
Identifier:            pyrun2.7
Version:               ???
Code Type:             X86-64 (Native)
Parent Process:        ??? [1]
Responsible:           KA-Lite Monitor [17265]
User ID:               501

Date/Time:             2015-06-03 14:34:10.539 +0800
OS Version:            Mac OS X 10.10.3 (14D136)
Report Version:        11
Anonymous UUID:        61F6CF4B-57AE-14A5-AB84-09F0CF72B0C6

Sleep/Wake UUID:       A8CB398E-1142-4EDF-9478-0B964BD166EF

Time Awake Since Boot: 160000 seconds
Time Since Wake:       49000 seconds

Crashed Thread:        21

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000110
...

Thus the ka-lite server has also stopped, so I have to restart the server (on the terminal or thru the status menu icon).

When I tried to download a video again after restarting the ka-lite server, it only shows the "Launched video download process successfully." message but no progress indicator and no errors or crashes also. See screenshot below.

screenshot 2015-06-03 14 53 05

So I think we need to check what's the reason why pyrun2.7 crashed during the first attempt.

Does the bin/kalite write to a log file somewhere @benjaoming? Or how can I pipe all output of the web server access to a file so we can see which commands were attempted on pyrun2.7 before the crash?

@cpauya
Copy link
Contributor

cpauya commented Jun 3, 2015

I ran these commands on the terminal (repeat setup with new database and do runserver):

kalite manage setup
kalite manage runserver

And I was able to download videos successfully. I also tried kalite manage kaserve and it also worked.

Here's a screenshot of the video download happening:
screenshot 2015-06-03 15 11 02

Then I tried to do kalite start and tried to download a video which failed. This is what the Mac installer is using.

So using kalite start to start the server will fail the video download, but using kalite manage runserver or kalite manage kaserve will work!

Any ideas @benjaoming?

@benjaoming
Copy link
Contributor

@cpauya
Could you post the exception/error message that you presumably have from the video downloads? You can find them on http://localhost:8008/admin/chronograph/log/.

@cpauya
Copy link
Contributor

cpauya commented Jun 3, 2015

Hi @benjaoming, I noticed that any first attempt to download a video will crash the server. The succeeding attempts to download the same video will not. After the second attempt, it generates a Download Videos log at chronograph like this:

screenshot 2015-06-03 21 49 01

Please note that there's no log or any sort for the first attempt which crashes the server.

On the other hand, I'm not sure if this is related because this is about Secure Sync log but here's the last entry about it from Chronograph log:

Stdout:

Checking purgatory for unsaved models...
Purgatory is model-free! Congrats!
Initiating SyncSession...
Syncing models...

Stderr:

'int' object is not callable

Traceback (most recent call last):

File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/fle_utils/chronograph/models.py", line 209, in run_management_command
call_command(self.command, *args, **options)

File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/core/management/__init__.py", line 161, in call_command
return klass.execute(*args, **defaults)

File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/core/management/base.py", line 263, in execute
output = self.handle(*args, **options)

File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/securesync/management/commands/syncmodels.py", line 62, in handle
results = client.sync_models()

File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/securesync/engine/api_client.py", line 251, in sync_models
counters_to_download, counters_to_upload = self.sync_device_records()

File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/securesync/engine/api_client.py", line 162, in sync_device_records
client_counters = self.get_client_device_counters()

File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/securesync/engine/api_client.py", line 154, in get_client_device_counters
return get_device_counters(zone=self.session.client_device.get_zone())

File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/securesync/engine/utils.py", line 71, in get_device_counters
device_counters[device.id] += cnt()

TypeError: 'int' object is not callable

@benjaoming
Copy link
Contributor

@cpauya strange!

  1. After adding the job the very first time and failing to download, can you find the job here? http://localhost:8000/admin/chronograph/job/
  2. After adding the video again, will it be downloaded twice?

@cpauya
Copy link
Contributor

cpauya commented Jun 3, 2015

@benjaoming,

Using kalite start:

  1. No job/log is created on first attempt because the server (pyrun2.7) crashed so I have to run kalite start again.
  2. For subsequent attempts on the same video, it won't crash the server but won't download the video either if I used kalite start.

But then again, using kalite manage kaserve will download the video successfully. Perhaps there's a difference between running kalite start versus kalite manage kaserve?

@benjaoming
Copy link
Contributor

@cpauya kalite start invokes kaserve but also starts a background service with chronograph, a thread running separately from the HTTP server. The chronograph thread is the one that downloads files. Just running kaserve won't download any videos, just queue the job.

If no jobs are created, this indicates that something is wrong with the process running the server. Are you able to get output from the crashing kalite start ?

@cpauya
Copy link
Contributor

cpauya commented Jun 3, 2015

Sadly I can't find any output or log, I tried looking up at the Console crash report but nothing there.

I guess we have an issue with using pyrun2.7 instead of pure Python? Will dig deeper tomorrow.

@cpauya
Copy link
Contributor

cpauya commented Jun 3, 2015

Tried to do this:

launchctl setenv KALITE_PYTHON "/Library/Frameworks/Python.framework/Versions/2.7/bin/python" so it will use my OSX 2.7.9 python instead of pyrun, restarted terminal, re-ran kalite start, and it worked!

So for now I would say the failing video download has something to do with the pyrun binary. Ok, really going to bed now.

@aronasorman
Copy link
Collaborator

Heh, this is frustrating. The reason why we moved to PyRun in the first place was to get away from a buggy MacPython. If it's also crashing on PyRun, then we best just bundle the python installer, similar to what the windows installer does. Or perhaps get to the root of the issue of why it's segfaulting in the first place.

@benjaoming
Copy link
Contributor

@aronasorman it's not confirmed yet that it's a Pyrun segfault but it does sound an awful lot like it :(

@cpauya sleep well, sounds like a problem worth sleeping on :)

@cpauya
Copy link
Contributor

cpauya commented Jun 12, 2015

It's hard to trace this without a log file of which part of the ka-lite code caused the crash. Having a logger like mentioned on this PR - "Daemonize with Popen, remove redundant daemonize function" will greatly help trace this.

aronasorman added a commit that referenced this issue Jun 12, 2015
Workaround for #3704. Running videodownload on OS X results in a strange quitting behaviour by the python process if run as a thread. If run as a process, it should work fine, albeit it might take longer. Good enough for now.
@cpauya cpauya added the ready label Jun 24, 2015
@aronasorman aronasorman added has PR and removed ready labels Jun 24, 2015
@cpauya
Copy link
Contributor

cpauya commented Jul 4, 2015

Update:

  1. A similar issue was filed at For Mac OS, Video Downloading Issue #3990 - I verified the same issue using build 165 of the dungeon server OSX installer. I closed it as duplicate.
  2. I have this error on .kalite/server.log when trying to download a video:
Process CommandProcess-4:
Traceback (most recent call last):
  File "<pyrun>/multiprocessing/process.py", line 258, in _bootstrap
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/fle_utils/django_utils/command.py", line 92, in run
    call_command(self.cmd, *self.args, **self.kwargs)
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/core/management/__init__.py", line 161, in call_command
    return klass.execute(*args, **defaults)
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/core/management/base.py", line 263, in execute
    output = self.handle(*args, **options)
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/fle_utils/chronograph/management/commands/cron.py", line 10, in handle
    for job in Job.objects.due():
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/db/models/query.py", line 123, in _result_iter
    self._fill_cache()
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/db/models/query.py", line 927, in _fill_cache
    self._result_cache.append(next(self._iter))
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/db/models/query.py", line 301, in iterator
    for row in compiler.results_iter():
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/db/models/sql/compiler.py", line 775, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
    cursor.execute(sql, params)
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/db/backends/sqlite3/base.py", line 366, in execute
    six.reraise(utils.DatabaseError, utils.DatabaseError(*tuple(e.args)), sys.exc_info()[2])
  File "/Applications/KA-Lite Monitor.app/Contents/Resources/ka-lite/python-packages/django/db/backends/sqlite3/base.py", line 362, in execute
    return Database.Cursor.execute(self, query, params)
DatabaseError: no such table: chronograph_job

We are tracing why we are now having this issue.

@benjaoming
Copy link
Contributor

Could you have a look in the database file itself? It's very strange that chronograph_job is missing, makes me wonder if other tables are missing, too?

There's a great Firefox add-on for SQLite management: https://addons.mozilla.org/en-us/firefox/addon/sqlite-manager/

@mrpau-richard
Copy link
Contributor

@cpauya running command kalite manage kaserve on 0.14.x branch and download any videos having this DatabaseError: disk I/O error, I think this is the reason why I can't download video.

screen shot 2015-07-07 at 7 32 47 pm

@aronasorman
Copy link
Collaborator

@cpauya @amodia can you post your updates on this?

@cpauya
Copy link
Contributor

cpauya commented Jul 17, 2015

We found that the call to call_command_subprocess() crashes under PyRun, this includes calls to cron, video downloads, language pack downloads, etc.

Saw that PyRun doesn't include the _multiprocessing standard library as per their documentation:
screenshot 2015-07-17 01 52 47

So we have to modify the call to fle_utils.django_utils.command.call_command_subprocess() to not use the multiprocessing module and use the subprocess module instead if under PyRun.

Ideally we must build our PyRun to include the multiprocessing module but that is due for another issue on the installers repo filed at learningequality/ka-lite-installers#100.

We are now testing our changes and will issue a PR for this soon.

@benjaoming
Copy link
Contributor

Great work!!

The python-packages/fle_utils/django_utils/command.py is in a sad state, we could ideally abandon it in favor of a simple approach.

Let's discuss a simple and low-risk for 0.14.

One of the sad things in python-packages/fle_utils/django_utils/command.py is that it declares call_command_subprocess which uses the subprocess library and then REDECLARES call_command_subprocess as a function that uses multiprocessing :/

One much simpler approach would be to use normal threads that are signalled to shutdown by the main process in the same fashion that bin/kalite runs cronserver_blocking and tells it to exit. This reduces risks of process leaks clogging memory and leaked processes accessing the same files as newer processes.

@aronasorman
Copy link
Collaborator

Should be fixed in the latest mac installer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants