Merge pull request #999 from trungleduc/ft/support-query-variable

Add support for query variables in preheat kernel mode
voila-dashboards · Oct 14, 2021 · a03f0af · a03f0af
2 parents 88d55ea + b23e05c
commit a03f0af
Show file tree

Hide file tree

Showing 10 changed files with 231 additions and 42 deletions.
diff --git a/docs/source/customize.rst b/docs/source/customize.rst
@@ -313,24 +313,24 @@ There is also the ``MappingKernelManager.cull_busy`` and ``MappingKernelManager.
 
 For more information about these options, check out the `Jupyter Server <https://jupyter-server.readthedocs.io/en/latest/other/full-config.html#options>`_ documentation.
 
-Pre-heat kernels
-=================
+Preheated kernels
+==================
 
-Since Voilà needs to start a new jupyter kernel and execute the requested notebook in this kernel for every connection, this would lead to a long waiting time before the widgets can be displayed in browser. 
-To reduce this waiting time, especially for the heavy notebooks, user can use the pre-heating kernel option of Voilà, this option will enable two features:
+Since Voilà needs to start a new jupyter kernel and execute the requested notebook in this kernel for every connection, this would lead to a long waiting time before the widgets can be displayed in the browser. 
+To reduce this waiting time, especially for the heavy notebooks, users can activate the preheating kernel option of Voilà, this option will enable two features:
 
-- A pool of kernels is started for each notebook and kept in standby, then the notebook is executed in every kernel of its pool. When a new client requests a kernel, the pre-heated kernel in this pool is used and another kernel is started asynchronously to refill the pool.
-- The HTML version of notebook is rendered in each pre-heated kernel and stored, when a client connects to Voila, under some conditions, the cached HTML is served instead of re-rendering the notebook.
+- A pool of kernels is started for each notebook and kept in standby, then the notebook is executed in every kernel of its pool. When a new client requests a kernel, the preheated kernel in this pool is used and another kernel is started asynchronously to refill the pool.
+- The HTML version of the notebook is rendered in each preheated kernel and stored, when a client connects to Voila, under some conditions, the cached HTML is served instead of re-rendering the notebook.
 
-The pre-heat kernel option works with any kernel manager, it is deactivated by default, re-activate it by setting `preheat_kernel = True`.  For example, with this command, for each notebook Voilà started with, a pool of 5 kernels is created and will be used for new connections.
+The preheating kernel option works with any kernel manager, it is deactivated by default, re-activate it by setting `preheat_kernel = True`.  For example, with this command, for each notebook Voilà started with, a pool of 5 kernels is created and will be used for new connections.
 
 .. code-block:: bash
 
     voila --preheat_kernel=True --pool_size=5
 
-If the pool size does not match the user's requirements, or some notebooks need to use environment variables..., additional settings are needed.  The easiest way to change these settings is to provide a file named `voila.json` in the same folder containing the notebooks. Settings for pre-heat kernel ( list of notebooks does not need pre-heated kernels, number of kernels in pool, refilling delay, environment variables for starting kernel...) can be set under the `VoilaKernelManager` class name.
+If the pool size does not match the user's requirements, or some notebooks need to use environment variables..., additional settings are needed.  The easiest way to change these settings is to provide a file named `voila.json` in the same folder containing the notebooks. Settings for preheating kernel ( list of notebooks does not need preheated kernels, number of kernels in pool, refilling delay, environment variables for starting kernel...) can be set under the `VoilaKernelManager` class name.
 
-Here is an example of settings with explanations for pre-heat kernel option. 
+Here is an example of settings with explanations for preheating kernel option. 
 
 .. code-block:: python
 
@@ -374,14 +374,55 @@ Here is an example of settings with explanations for pre-heat kernel option.
       }
    }
 
-Notebook HTML will be pre-rendered with template and theme defined in VoilaConfiguration or in notebook metadata. The pre-heated kernel and cached HTML are used if these conditions are matched:
+Notebook HTML will be pre-rendered with template and theme defined in VoilaConfiguration or notebook metadata. The preheated kernel and cached HTML are used if these conditions are matched:
 
-- There is an available pre-heated kernel in the kernel pool.
+- There is an available preheated kernel in the kernel pool.
 - If user overrides the template/theme with query string, it must match the template/theme used to pre-render the notebook.
-- There is no other query strings than `voila-theme` and `voila-template`.
 
 If the kernel pool is empty or the request does not match these conditions, Voila will fail back to start a normal kernel and render the notebook as usual.
 
+Partially pre-render notebook
+------------------------------
+
+To benefit the acceleration of preheating kernel mode, the notebooks need to be pre-rendered before users actually connect to Voilà. But in many real-world cases, the notebook requires some user-specific data to render correctly the widgets, which makes pre-rendering impossible. To overcome this limit, Voilà offers a feature to treat the most used method for providing user data: the URL `query string`.
+
+In normal mode, Voilà users can get the `query string` at run time through the ``QUERY_STRING`` environment variable:
+
+.. code-block:: python
+
+   import os
+   query_string = os.getenv('QUERY_STRING') 
+
+In preheating kernel mode, users can just replace the ``os.getenv`` call with the helper ``get_query_string`` from ``voila.utils``
+
+.. code-block:: python
+
+   from voila.utils import get_query_string
+   query_string = get_query_string()
+
+``get_query_string`` will pause the execution of the notebook in the preheated kernel at this cell and wait for an actual user to connect to Voilà, then ``get_query_string`` will return the URL `query string` and continue the execution of the remaining cells. 
+
+If the Voilà websocket handler is not started with the default protocol (`ws`), the default IP address (`127.0.0.1`) or the default port (`8866`), users need to provide these values through the environment variables ``VOILA_APP_PROTOCOL``, ``VOILA_APP_IP`` and ``VOILA_APP_PORT``. The easiest way is to set these variables in the `voila.json` configuration file, for example:
+
+.. code-block:: python
+
+   # voila.json
+   {
+      ...
+      "VoilaKernelManager": {
+         "kernel_pools_config": { 
+            "foo.ipynb": {
+               "kernel_env_variables": { 
+                  "VOILA_APP_IP": "192.168.1.1",
+                  "VOILA_APP_PORT": "6789",
+                  "VOILA_APP_PROTOCOL": "wss"
+               }
+            }
+         },
+      ...
+      }
+   }
+
 Hiding output and code cells based on cell tags
 ===============================================
 

diff --git a/setup.cfg b/setup.cfg
@@ -36,6 +36,7 @@ install_requires =
     jupyter_client>=6.1.3,<8
     nbclient>=0.4.0,<0.6
     nbconvert>=6.0.0,<7
+    websockets>=9.0
 
 [options.extras_require]
 dev =

diff --git a/tests/app/preheat_activation_test.py b/tests/app/preheat_activation_test.py
@@ -71,13 +71,13 @@ async def test_render_time_with_multiple_requests(http_server_client,
 async def test_request_with_query(http_server_client, base_url):
     """
     We sent request with query parameter, preheat kernel should
-    be disable is this case.
+    be activated.
     """
     url = f'{base_url}?foo=bar'
     time, _ = await send_request(sc=http_server_client,
                                  url=url,
                                  wait=NOTEBOOK_EXECUTION_TIME + 1)
-    assert time > TIME_THRESHOLD
+    assert time < TIME_THRESHOLD
 
 
 async def test_request_with_theme_parameter(http_server_client, base_url):

diff --git a/tests/notebooks/preheat/pre_heat.ipynb b/tests/notebooks/preheat/pre_heat.ipynb
@@ -22,14 +22,6 @@
     "import os\n",
     "os.getenv('foo')"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0e689ec5-708c-4cac-98ba-02b00411e41d",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
@@ -48,7 +40,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.6"
+   "version": "3.9.7"
   }
  },
  "nbformat": 4,

diff --git a/voila/app.py b/voila/app.py
@@ -61,6 +61,7 @@
 from .exporter import VoilaExporter
 from .shutdown_kernel_handler import VoilaShutdownKernelHandler
 from .voila_kernel_manager import voila_kernel_manager_factory
+from .query_parameters_handler import QueryStringSocketHandler
 
 _kernel_id_regex = r"(?P<kernel_id>\w+-\w+-\w+-\w+-\w+)"
 
@@ -423,7 +424,7 @@ def start(self):
             self.voila_configuration.multi_kernel_manager_class,
             preheat_kernel,
             pool_size
-            )
+        )
         self.kernel_manager = kernel_manager_class(
             parent=self,
             connection_dir=self.connection_dir,
@@ -483,6 +484,13 @@ def start(self):
             (url_path_join(self.server_url, r'/voila/api/shutdown/(.*)'), VoilaShutdownKernelHandler)
         ])
 
+        if preheat_kernel:
+            handlers.append(
+                (
+                    url_path_join(self.server_url, r'/voila/query/%s' % _kernel_id_regex),
+                    QueryStringSocketHandler
+                )
+            )
         # Serving notebook extensions
         if self.voila_configuration.enable_nbextensions:
             handlers.append(
@@ -533,7 +541,7 @@ def start(self):
                      'template_paths': self.template_paths,
                      'config': self.config,
                      'voila_configuration': self.voila_configuration
-                 }),
+                }),
             ])
 
         self.app.add_handlers('.*$', handlers)

diff --git a/voila/handler.py b/voila/handler.py
@@ -20,6 +20,8 @@
 
 from ._version import __version__
 from .notebook_renderer import NotebookRenderer
+from .query_parameters_handler import QueryStringSocketHandler
+from .utils import ENV_VARIABLE
 
 
 class VoilaHandler(JupyterHandler):
@@ -45,17 +47,16 @@ async def get(self, path=None):
 
         # Adding request uri to kernel env
         kernel_env = os.environ.copy()
-        kernel_env['SCRIPT_NAME'] = self.request.path
+        kernel_env[ENV_VARIABLE.SCRIPT_NAME] = self.request.path
         kernel_env[
-            'PATH_INFO'
+            ENV_VARIABLE.PATH_INFO
         ] = ''  # would be /foo/bar if voila.ipynb/foo/bar was supported
-        kernel_env['QUERY_STRING'] = str(self.request.query)
-        kernel_env['SERVER_SOFTWARE'] = 'voila/{}'.format(__version__)
-        kernel_env['SERVER_PROTOCOL'] = str(self.request.version)
+        kernel_env[ENV_VARIABLE.QUERY_STRING] = str(self.request.query)
+        kernel_env[ENV_VARIABLE.SERVER_SOFTWARE] = 'voila/{}'.format(__version__)
+        kernel_env[ENV_VARIABLE.SERVER_PROTOCOL] = str(self.request.version)
         host, port = split_host_and_port(self.request.host.lower())
-        kernel_env['SERVER_PORT'] = str(port) if port else ''
-        kernel_env['SERVER_NAME'] = host
-
+        kernel_env[ENV_VARIABLE.SERVER_PORT] = str(port) if port else ''
+        kernel_env[ENV_VARIABLE.SERVER_NAME] = host
         # Add HTTP Headers as env vars following rfc3875#section-4.1.18
         if len(self.voila_configuration.http_header_envs) > 0:
             for header_name in self.request.headers:
@@ -92,10 +93,12 @@ async def get(self, path=None):
             # Get the pre-rendered content of notebook, the result can be all rendered cells
             # of the notebook or some rendred cells and a generator which can be used by this
             # handler to continue rendering calls.
-            render_task, rendered_cache = await self.kernel_manager.get_rendered_notebook(
+
+            render_task, rendered_cache, kernel_id = await self.kernel_manager.get_rendered_notebook(
                     notebook_name=notebook_path,
             )
 
+            QueryStringSocketHandler.send_updates({'kernel_id': kernel_id, 'payload': self.request.query})
             # Send rendered cell to frontend
             if len(rendered_cache) > 0:
                 self.write(''.join(rendered_cache))
@@ -139,6 +142,8 @@ def time_out():
                 self.write('<script>voila_heartbeat()</script>\n')
                 self.flush()
 
+            kernel_env[ENV_VARIABLE.VOILA_PREHEAT] = 'False'
+            kernel_env[ENV_VARIABLE.VOILA_BASE_URL] = self.base_url
             kernel_id = await ensure_async(
                 (
                     self.kernel_manager.start_kernel(
@@ -180,10 +185,5 @@ def should_use_rendered_notebook(
             return False
         if theme is not None and rendered_theme != theme:
             return False
-        args_list = [
-            key for key in request_args if key not in ['voila-template', 'voila-theme']
-        ]
-        if len(args_list) > 0:
-            return False
 
         return True
diff --git a/voila/notebook_renderer.py b/voila/notebook_renderer.py
@@ -25,6 +25,7 @@
 from .execute import VoilaExecutor, strip_code_cell_warnings
 from .exporter import VoilaExporter
 from .paths import collect_template_paths
+from .utils import ENV_VARIABLE
 
 
 class NotebookRenderer(LoggingConfigurable):
@@ -221,7 +222,18 @@ async def _jinja_kernel_start(self, nb, kernel_id, kernel_future):
             self.executor.kc.wait_for_ready(timeout=self.executor.startup_timeout)
         )
         self.executor.kc.allow_stdin = False
-        ###
+        # Set `VOILA_KERNEL_ID` environment variable, this variable help user can
+        # identify which kernel the notebook use.
+        if nb.metadata.kernelspec['language'] == 'python':
+            await ensure_async(
+                self.executor.kc.execute(
+                    f'''import os
+                    \nos.environ["{ENV_VARIABLE.VOILA_KERNEL_ID}"]="{kernel_id}"
+                    ''',
+                    store_history=False,
+                )
+            )
+
         self.kernel_started = True
         return kernel_id
 

diff --git a/voila/query_parameters_handler.py b/voila/query_parameters_handler.py
@@ -0,0 +1,61 @@
+from tornado.websocket import WebSocketHandler
+import logging
+from typing import Dict
+
+
+class QueryStringSocketHandler(WebSocketHandler):
+    """A websocket handler used to provide the query string
+    assocciated with kernel ids in preheat kernel mode.
+
+    Class variables
+    ---------------
+    - _waiters : A dictionary which holds the `websocket` connection
+    assocciated with the kernel id.
+
+    - cache : A dictionary which holds the query string assocciated
+    with the kernel id.
+    """
+    _waiters = dict()
+    _cache = dict()
+
+    def open(self, kernel_id: str) -> None:
+        """Create a new websocket connection, this connection is
+        identified by the kernel id.
+
+        Args:
+            kernel_id (str): Kernel id used by the notebook when it opens
+            the websocket connection.
+        """
+        QueryStringSocketHandler._waiters[kernel_id] = self
+        if kernel_id in self._cache:
+            self.write_message(self._cache[kernel_id])
+
+    def on_close(self) -> None:
+        for k_id, waiter in QueryStringSocketHandler._waiters.items():
+            if waiter == self:
+                break
+        del QueryStringSocketHandler._waiters[k_id]
+
+    @classmethod
+    def send_updates(cls: 'QueryStringSocketHandler', msg: Dict) -> None:
+        """Class method used to dispath the query string to the waiting
+        notebook. This method is called in `VoilaHandler` when the query
+        string becomes available.
+        If this method is called before the opening of websocket connection,
+        `msg` is stored in `_cache0` and the message will be dispatched when
+        a notebook with coresponding kernel id is connected.
+
+        Args:
+            - msg (Dict): this dictionary contains the `kernel_id` to identify
+            the waiting notebook and `payload` is the query string.
+        """
+        kernel_id = msg['kernel_id']
+        payload = msg['payload']
+        waiter = cls._waiters.get(kernel_id, None)
+        if waiter is not None:
+            try:
+                waiter.write_message(payload)
+            except Exception:
+                logging.error("Error sending message", exc_info=True)
+
+        cls._cache[kernel_id] = payload