Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout waiting for IOPub output #426

Closed
gshiba opened this issue Sep 6, 2019 · 11 comments
Closed

Timeout waiting for IOPub output #426

gshiba opened this issue Sep 6, 2019 · 11 comments
Milestone

Comments

@gshiba
Copy link
Contributor

gshiba commented Sep 6, 2019

Hello!

Issue: When a cell takes too long to execute fills up the IOPub channel ZMQ buffer, papermill (or nbconvert?) (almost) silently trims off the output and carries on to the next cell. I believe it should raise an error and exit with non-zero.

It appears this is addressed in jupyter/nbconvert#994, but I can't tell if the fix there will solve this issue.

Thank you!


To reproduce: Make a notebook with a single cell (adapted from jupyter/nbconvert#659 (comment)):

import sys
import time
str = '0'
for x in range(0, 10000):
    sys.stdout.write(str*100)
    sys.stdout.flush()
    time.sleep(0.0001)
print('hi')

Then, execute it. A warning is printed, but exit code is zero, and the 'hi' is not printed.

$ papermill input.ipynb output.ipynb
Input Notebook:  /home/gosuke/tmp/input.ipynb
Output Notebook: output.ipynb
Executing:   0%|                                       | 0/2 [00:00<?, ?cell/s]
Timeout waiting for IOPub output
Executing: 100%|███████████████████████████████| 2/2 [00:27<00:00, 14.26s/cell]
$ echo $?
0
$ tail -n 60 output.ipynb
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "import time\n",
    "str = '0'\n",
    "for x in range(0, 10000):\n",
    "    sys.stdout.write(str*100)\n",
    "    sys.stdout.flush()\n",
    "    time.sleep(0.0001)\n",
    "print('hi')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:brick37]",
   "language": "python",
   "name": "conda-env-brick37-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "papermill": {
   "duration": 16.32673,
   "end_time": "2019-09-06T17:52:44.503019",
   "environment_variables": {},
   "exception": null,
   "input_path": "/home/gosuke/tmp/input.ipynb",
   "output_path": "output.ipynb",
   "parameters": {},
   "start_time": "2019-09-06T17:52:28.176289",
   "version": "1.1.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
@MSeal
Copy link
Member

MSeal commented Sep 7, 2019

Thanks for raising the issue and making a clearly reproducible case.

So the fix in jupyter/nbconvert#994 helps with the issue but doesn't make it impossible to occur. I can reproduce with the latest nbconvert which includes to mentioned changed. The part that's causing the issue is that the zmq buffer can't take is the number of tiny messages per nbconvert cycle. If the message rate is modified by sleeping longer:

import sys
import time
str = '0'
for x in range(0, 1000):
    sys.stdout.write(str*100)
    sys.stdout.flush()
    time.sleep(0.01)
print('hi')

or the sys.stdout.flush() is commented out then the message buffer size is fine and the cell will execute as expected.

I've touched pretty much every part of the code up to the pyzmq layer now and we can make it slightly better but mostly there is a hardish limit to max message rate a kernel client can handle. It may be worth making an issue on https://github.com/ipython/ipykernel to see if that kernel could apply backpressure or action skip to the sys flush call to prevent very high flush rates for just kernel executions.

All that being said, I'd be amendable to making raise_on_iopub_timeout default to true in papermill, but I might want to get @mpacer @willingc @rgbkrk 's opinions before we make a default change to this field, in case they have a good objection to changing the failure mode.

@gshiba
Copy link
Contributor Author

gshiba commented Sep 10, 2019

My original use case isn't necessarily lots of small messages, but rather lots of pngs. Something like the following:

from IPython.display import display, Markdown, Image
len(directories)  # roughly 900
for d in directories:
    display(Markdown(f'# {d}'))
    for png in ['a.png', 'b.png', 'c.png']:  # Roughly 200KB, 200KB, and 50KB on disk
        display(Image(filename=f'{d}/{png}'))
    sleep(0.5)
x = 'hello'

# next cell
print(x)

The papermill output ipynb file is ~120MB and shows pngs for roughly ~500 of the 900 directories, and 'hello' is printed in the next cell. When I run the same notebook interactively (through Chrome), the browser tab crashes on my PC.

Is there a limit on size (in bytes) somewhere as well?

Either way, I'd be happy with an error being raised for now.

@MSeal
Copy link
Member

MSeal commented Sep 16, 2019

While there's no explicit limit to a notebook output size, I would say that anything above 100MB is in the realm of "this will crash browsers". Papermill will actually handle very large notebooks better than browsers (it's only limited by rate of messages), but the format still doesn't support large files well.

@MSeal
Copy link
Member

MSeal commented Sep 16, 2019

Other papermill devs @mpacer @captainsafia @rgbkrk @willingc , on the topic of raising an error for the buffer overload case I think this would be a reasonable change but it would differ from the default from nbconvert that's been held for a long time.

@rgbkrk
Copy link
Member

rgbkrk commented Sep 18, 2019

Gosh, anything above 20 MB will hang most browsers.

@rgbkrk
Copy link
Member

rgbkrk commented Sep 18, 2019

I think raising an error in papermill's case makes sense. How reproducible is the notebook if data is dropped?

@MSeal
Copy link
Member

MSeal commented Sep 18, 2019

Highly reproducible as far as I could tell from my local tests.

@willingc
Copy link
Member

I think raising an error makes sense. We can clarify in the docs and mention the difference in default behavior from nbconvert in the docs/docstring.

@MSeal
Copy link
Member

MSeal commented Sep 19, 2019

I'll work on making that change then.

@MSeal MSeal added this to the Papermill 2.0 milestone Jan 23, 2020
jsvine added a commit to jsvine/nbexec that referenced this issue Jan 24, 2020
See the following links for details on the issues with IOPub and
nbconvert's ExecutePreprocessor:

- nteract/papermill#426 (comment)
- jupyter/nbconvert#994
@MSeal
Copy link
Member

MSeal commented Feb 11, 2020

Change made in the 2.0 release (was a single config line so I skipped the PR)

@mirekphd
Copy link

We started getting papermill.execute_notebook errors on a notebook cell with 'IOPub message rate exceeded.':

A cell timed out while it was being executed, after 4 seconds.
The message was: Timeout waiting for IOPub output.

This is almost certainly caused by this change from v2.0.0 finally kicking in:
IOPub timeouts now raise an exception instead of a warning.
[ https://papermill.readthedocs.io/en/latest/changelog.html ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants