Skip to content

Commit

Permalink
Merge pull request #999 from trungleduc/ft/support-query-variable
Browse files Browse the repository at this point in the history
Add support  for query variables in preheat kernel mode
  • Loading branch information
SylvainCorlay authored Oct 14, 2021
2 parents 88d55ea + b23e05c commit a03f0af
Show file tree
Hide file tree
Showing 10 changed files with 231 additions and 42 deletions.
65 changes: 53 additions & 12 deletions docs/source/customize.rst
Original file line number Diff line number Diff line change
Expand Up @@ -313,24 +313,24 @@ There is also the ``MappingKernelManager.cull_busy`` and ``MappingKernelManager.

For more information about these options, check out the `Jupyter Server <https://jupyter-server.readthedocs.io/en/latest/other/full-config.html#options>`_ documentation.

Pre-heat kernels
=================
Preheated kernels
==================

Since Voilà needs to start a new jupyter kernel and execute the requested notebook in this kernel for every connection, this would lead to a long waiting time before the widgets can be displayed in browser.
To reduce this waiting time, especially for the heavy notebooks, user can use the pre-heating kernel option of Voilà, this option will enable two features:
Since Voilà needs to start a new jupyter kernel and execute the requested notebook in this kernel for every connection, this would lead to a long waiting time before the widgets can be displayed in the browser.
To reduce this waiting time, especially for the heavy notebooks, users can activate the preheating kernel option of Voilà, this option will enable two features:

- A pool of kernels is started for each notebook and kept in standby, then the notebook is executed in every kernel of its pool. When a new client requests a kernel, the pre-heated kernel in this pool is used and another kernel is started asynchronously to refill the pool.
- The HTML version of notebook is rendered in each pre-heated kernel and stored, when a client connects to Voila, under some conditions, the cached HTML is served instead of re-rendering the notebook.
- A pool of kernels is started for each notebook and kept in standby, then the notebook is executed in every kernel of its pool. When a new client requests a kernel, the preheated kernel in this pool is used and another kernel is started asynchronously to refill the pool.
- The HTML version of the notebook is rendered in each preheated kernel and stored, when a client connects to Voila, under some conditions, the cached HTML is served instead of re-rendering the notebook.

The pre-heat kernel option works with any kernel manager, it is deactivated by default, re-activate it by setting `preheat_kernel = True`. For example, with this command, for each notebook Voilà started with, a pool of 5 kernels is created and will be used for new connections.
The preheating kernel option works with any kernel manager, it is deactivated by default, re-activate it by setting `preheat_kernel = True`. For example, with this command, for each notebook Voilà started with, a pool of 5 kernels is created and will be used for new connections.

.. code-block:: bash
voila --preheat_kernel=True --pool_size=5
If the pool size does not match the user's requirements, or some notebooks need to use environment variables..., additional settings are needed. The easiest way to change these settings is to provide a file named `voila.json` in the same folder containing the notebooks. Settings for pre-heat kernel ( list of notebooks does not need pre-heated kernels, number of kernels in pool, refilling delay, environment variables for starting kernel...) can be set under the `VoilaKernelManager` class name.
If the pool size does not match the user's requirements, or some notebooks need to use environment variables..., additional settings are needed. The easiest way to change these settings is to provide a file named `voila.json` in the same folder containing the notebooks. Settings for preheating kernel ( list of notebooks does not need preheated kernels, number of kernels in pool, refilling delay, environment variables for starting kernel...) can be set under the `VoilaKernelManager` class name.

Here is an example of settings with explanations for pre-heat kernel option.
Here is an example of settings with explanations for preheating kernel option.

.. code-block:: python
Expand Down Expand Up @@ -374,14 +374,55 @@ Here is an example of settings with explanations for pre-heat kernel option.
}
}
Notebook HTML will be pre-rendered with template and theme defined in VoilaConfiguration or in notebook metadata. The pre-heated kernel and cached HTML are used if these conditions are matched:
Notebook HTML will be pre-rendered with template and theme defined in VoilaConfiguration or notebook metadata. The preheated kernel and cached HTML are used if these conditions are matched:

- There is an available pre-heated kernel in the kernel pool.
- There is an available preheated kernel in the kernel pool.
- If user overrides the template/theme with query string, it must match the template/theme used to pre-render the notebook.
- There is no other query strings than `voila-theme` and `voila-template`.

If the kernel pool is empty or the request does not match these conditions, Voila will fail back to start a normal kernel and render the notebook as usual.

Partially pre-render notebook
------------------------------

To benefit the acceleration of preheating kernel mode, the notebooks need to be pre-rendered before users actually connect to Voilà. But in many real-world cases, the notebook requires some user-specific data to render correctly the widgets, which makes pre-rendering impossible. To overcome this limit, Voilà offers a feature to treat the most used method for providing user data: the URL `query string`.

In normal mode, Voilà users can get the `query string` at run time through the ``QUERY_STRING`` environment variable:

.. code-block:: python
import os
query_string = os.getenv('QUERY_STRING')
In preheating kernel mode, users can just replace the ``os.getenv`` call with the helper ``get_query_string`` from ``voila.utils``

.. code-block:: python
from voila.utils import get_query_string
query_string = get_query_string()
``get_query_string`` will pause the execution of the notebook in the preheated kernel at this cell and wait for an actual user to connect to Voilà, then ``get_query_string`` will return the URL `query string` and continue the execution of the remaining cells.

If the Voilà websocket handler is not started with the default protocol (`ws`), the default IP address (`127.0.0.1`) or the default port (`8866`), users need to provide these values through the environment variables ``VOILA_APP_PROTOCOL``, ``VOILA_APP_IP`` and ``VOILA_APP_PORT``. The easiest way is to set these variables in the `voila.json` configuration file, for example:

.. code-block:: python
# voila.json
{
...
"VoilaKernelManager": {
"kernel_pools_config": {
"foo.ipynb": {
"kernel_env_variables": {
"VOILA_APP_IP": "192.168.1.1",
"VOILA_APP_PORT": "6789",
"VOILA_APP_PROTOCOL": "wss"
}
}
},
...
}
}
Hiding output and code cells based on cell tags
===============================================

Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ install_requires =
jupyter_client>=6.1.3,<8
nbclient>=0.4.0,<0.6
nbconvert>=6.0.0,<7
websockets>=9.0

[options.extras_require]
dev =
Expand Down
4 changes: 2 additions & 2 deletions tests/app/preheat_activation_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,13 @@ async def test_render_time_with_multiple_requests(http_server_client,
async def test_request_with_query(http_server_client, base_url):
"""
We sent request with query parameter, preheat kernel should
be disable is this case.
be activated.
"""
url = f'{base_url}?foo=bar'
time, _ = await send_request(sc=http_server_client,
url=url,
wait=NOTEBOOK_EXECUTION_TIME + 1)
assert time > TIME_THRESHOLD
assert time < TIME_THRESHOLD


async def test_request_with_theme_parameter(http_server_client, base_url):
Expand Down
10 changes: 1 addition & 9 deletions tests/notebooks/preheat/pre_heat.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,6 @@
"import os\n",
"os.getenv('foo')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e689ec5-708c-4cac-98ba-02b00411e41d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -48,7 +40,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.9.7"
}
},
"nbformat": 4,
Expand Down
12 changes: 10 additions & 2 deletions voila/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
from .exporter import VoilaExporter
from .shutdown_kernel_handler import VoilaShutdownKernelHandler
from .voila_kernel_manager import voila_kernel_manager_factory
from .query_parameters_handler import QueryStringSocketHandler

_kernel_id_regex = r"(?P<kernel_id>\w+-\w+-\w+-\w+-\w+)"

Expand Down Expand Up @@ -423,7 +424,7 @@ def start(self):
self.voila_configuration.multi_kernel_manager_class,
preheat_kernel,
pool_size
)
)
self.kernel_manager = kernel_manager_class(
parent=self,
connection_dir=self.connection_dir,
Expand Down Expand Up @@ -483,6 +484,13 @@ def start(self):
(url_path_join(self.server_url, r'/voila/api/shutdown/(.*)'), VoilaShutdownKernelHandler)
])

if preheat_kernel:
handlers.append(
(
url_path_join(self.server_url, r'/voila/query/%s' % _kernel_id_regex),
QueryStringSocketHandler
)
)
# Serving notebook extensions
if self.voila_configuration.enable_nbextensions:
handlers.append(
Expand Down Expand Up @@ -533,7 +541,7 @@ def start(self):
'template_paths': self.template_paths,
'config': self.config,
'voila_configuration': self.voila_configuration
}),
}),
])

self.app.add_handlers('.*$', handlers)
Expand Down
28 changes: 14 additions & 14 deletions voila/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

from ._version import __version__
from .notebook_renderer import NotebookRenderer
from .query_parameters_handler import QueryStringSocketHandler
from .utils import ENV_VARIABLE


class VoilaHandler(JupyterHandler):
Expand All @@ -45,17 +47,16 @@ async def get(self, path=None):

# Adding request uri to kernel env
kernel_env = os.environ.copy()
kernel_env['SCRIPT_NAME'] = self.request.path
kernel_env[ENV_VARIABLE.SCRIPT_NAME] = self.request.path
kernel_env[
'PATH_INFO'
ENV_VARIABLE.PATH_INFO
] = '' # would be /foo/bar if voila.ipynb/foo/bar was supported
kernel_env['QUERY_STRING'] = str(self.request.query)
kernel_env['SERVER_SOFTWARE'] = 'voila/{}'.format(__version__)
kernel_env['SERVER_PROTOCOL'] = str(self.request.version)
kernel_env[ENV_VARIABLE.QUERY_STRING] = str(self.request.query)
kernel_env[ENV_VARIABLE.SERVER_SOFTWARE] = 'voila/{}'.format(__version__)
kernel_env[ENV_VARIABLE.SERVER_PROTOCOL] = str(self.request.version)
host, port = split_host_and_port(self.request.host.lower())
kernel_env['SERVER_PORT'] = str(port) if port else ''
kernel_env['SERVER_NAME'] = host

kernel_env[ENV_VARIABLE.SERVER_PORT] = str(port) if port else ''
kernel_env[ENV_VARIABLE.SERVER_NAME] = host
# Add HTTP Headers as env vars following rfc3875#section-4.1.18
if len(self.voila_configuration.http_header_envs) > 0:
for header_name in self.request.headers:
Expand Down Expand Up @@ -92,10 +93,12 @@ async def get(self, path=None):
# Get the pre-rendered content of notebook, the result can be all rendered cells
# of the notebook or some rendred cells and a generator which can be used by this
# handler to continue rendering calls.
render_task, rendered_cache = await self.kernel_manager.get_rendered_notebook(

render_task, rendered_cache, kernel_id = await self.kernel_manager.get_rendered_notebook(
notebook_name=notebook_path,
)

QueryStringSocketHandler.send_updates({'kernel_id': kernel_id, 'payload': self.request.query})
# Send rendered cell to frontend
if len(rendered_cache) > 0:
self.write(''.join(rendered_cache))
Expand Down Expand Up @@ -139,6 +142,8 @@ def time_out():
self.write('<script>voila_heartbeat()</script>\n')
self.flush()

kernel_env[ENV_VARIABLE.VOILA_PREHEAT] = 'False'
kernel_env[ENV_VARIABLE.VOILA_BASE_URL] = self.base_url
kernel_id = await ensure_async(
(
self.kernel_manager.start_kernel(
Expand Down Expand Up @@ -180,10 +185,5 @@ def should_use_rendered_notebook(
return False
if theme is not None and rendered_theme != theme:
return False
args_list = [
key for key in request_args if key not in ['voila-template', 'voila-theme']
]
if len(args_list) > 0:
return False

return True
14 changes: 13 additions & 1 deletion voila/notebook_renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from .execute import VoilaExecutor, strip_code_cell_warnings
from .exporter import VoilaExporter
from .paths import collect_template_paths
from .utils import ENV_VARIABLE


class NotebookRenderer(LoggingConfigurable):
Expand Down Expand Up @@ -221,7 +222,18 @@ async def _jinja_kernel_start(self, nb, kernel_id, kernel_future):
self.executor.kc.wait_for_ready(timeout=self.executor.startup_timeout)
)
self.executor.kc.allow_stdin = False
###
# Set `VOILA_KERNEL_ID` environment variable, this variable help user can
# identify which kernel the notebook use.
if nb.metadata.kernelspec['language'] == 'python':
await ensure_async(
self.executor.kc.execute(
f'''import os
\nos.environ["{ENV_VARIABLE.VOILA_KERNEL_ID}"]="{kernel_id}"
''',
store_history=False,
)
)

self.kernel_started = True
return kernel_id

Expand Down
61 changes: 61 additions & 0 deletions voila/query_parameters_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
from tornado.websocket import WebSocketHandler
import logging
from typing import Dict


class QueryStringSocketHandler(WebSocketHandler):
"""A websocket handler used to provide the query string
assocciated with kernel ids in preheat kernel mode.
Class variables
---------------
- _waiters : A dictionary which holds the `websocket` connection
assocciated with the kernel id.
- cache : A dictionary which holds the query string assocciated
with the kernel id.
"""
_waiters = dict()
_cache = dict()

def open(self, kernel_id: str) -> None:
"""Create a new websocket connection, this connection is
identified by the kernel id.
Args:
kernel_id (str): Kernel id used by the notebook when it opens
the websocket connection.
"""
QueryStringSocketHandler._waiters[kernel_id] = self
if kernel_id in self._cache:
self.write_message(self._cache[kernel_id])

def on_close(self) -> None:
for k_id, waiter in QueryStringSocketHandler._waiters.items():
if waiter == self:
break
del QueryStringSocketHandler._waiters[k_id]

@classmethod
def send_updates(cls: 'QueryStringSocketHandler', msg: Dict) -> None:
"""Class method used to dispath the query string to the waiting
notebook. This method is called in `VoilaHandler` when the query
string becomes available.
If this method is called before the opening of websocket connection,
`msg` is stored in `_cache0` and the message will be dispatched when
a notebook with coresponding kernel id is connected.
Args:
- msg (Dict): this dictionary contains the `kernel_id` to identify
the waiting notebook and `payload` is the query string.
"""
kernel_id = msg['kernel_id']
payload = msg['payload']
waiter = cls._waiters.get(kernel_id, None)
if waiter is not None:
try:
waiter.write_message(payload)
except Exception:
logging.error("Error sending message", exc_info=True)

cls._cache[kernel_id] = payload
Loading

0 comments on commit a03f0af

Please sign in to comment.