Emit events from the Contents Service #954

Zsailer · 2022-08-29T19:14:32Z

Uses the event logger in #862 to emit events from the Contents Manager.

Borrows the contents schema from #364

Zsailer · 2022-08-29T19:15:33Z

Pinging @dlqqq for feedback for your use-case in the File ID service proposed in #921.

Zsailer · 2022-08-29T19:20:29Z

Here's an example snippet of an extension "listening" to events emitted by the contents manager in this PR:

class MyExtension(ExtensionApp):

    ... 

    def initialize_settings(self):
        event_logger = self.serverapp.event_logger

        async def my_listener(logger: EventLogger, schema_id: str, data: dict) -> None:
            print(f"THIS WAS SEEN!: {data}")

        event_logger.add_listener(
            schema_id="https://events.jupyter.org/jupyter_server/contents_service/v1", listener=my_listener)

        ...

Let me know what you think!

codecov-commenter · 2022-08-29T19:29:46Z

Codecov Report

Merging #954 (f0ba8db) into main (63bb2a9) will increase coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #954      +/-   ##
==========================================
+ Coverage   72.07%   72.20%   +0.13%     
==========================================
  Files          64       64              
  Lines        8031     8063      +32     
  Branches     1340     1342       +2     
==========================================
+ Hits         5788     5822      +34     
+ Misses       1837     1836       -1     
+ Partials      406      405       -1

Impacted Files	Coverage Δ
jupyter_server/__init__.py	`70.58% <100.00%> (+6.30%)`	⬆️
jupyter_server/serverapp.py	`65.76% <100.00%> (+0.15%)`	⬆️
jupyter_server/services/contents/filemanager.py	`71.62% <100.00%> (+0.22%)`	⬆️
jupyter_server/services/contents/manager.py	`83.52% <100.00%> (+0.80%)`	⬆️
jupyter_server/services/kernels/handlers.py	`59.07% <0.00%> (+0.42%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

dlqqq · 2022-08-29T20:21:01Z

@Zsailer Wow! You beat me to it, I was just playing around with my own custom implementation. This looks great.

jupyter_server/serverapp.py

dlqqq · 2022-08-29T20:29:29Z

jupyter_server/services/contents/filemanager.py

@@ -395,6 +395,7 @@ def get(self, path, content=True, type=None, format=None):
            if type == "directory":
                raise web.HTTPError(400, "%s is not a directory" % path, reason="bad type")
            model = self._file_model(path, content=content, format=format)
+        self.emit(data={"action": "get", "path": path})


I have a vague performance concern about emitting an event on every get and save. These are by far the most common ContentsManager actions called by clients. In my opinion, we should only emit events that have demonstrated a developer need. For my File ID manager, I do not need these events. If there are no known uses for these events, I suggest we remove these events entirely rather than attempt to anticipate future uses at the cost of performance.

In my opinion, we should only emit events that have demonstrated a developer need

I think this is too stringent of a condition.

Events certainly have use-cases outside of just developers. Operators/admins look to use this event system to audit user activity—in fact, this was the original demand driving the jupyter events/telemetry efforts mentioned in this Jupyter Telemetry enhancement proposal (jupyter/enhancement-proposals#41).

If this work is causing a noticeable performance degradation, we should address it rather than limit the usage of the event system for operators.

Also, keep in mind, in scenarios where there are no handlers or listeners, the .emit method immediately returns: https://github.com/jupyter/jupyter_events/blob/50746633a2adc7e41e2e0e1b0631db10ffb165db/jupyter_events/logger.py#L337-L342

Ah, I think we actually are in agreement; I forgot Jupyter maintains a distinction between developers and operators, and I was using the word "developer" to mean both. Yes, we should definitely be conscious of performance when building this, though I'm sure it's quite difficult to measure.

Also, keep in mind, in scenarios where there are no handlers or listeners, the .emit method immediately returns:

I took a look at that logic, and I'm not sure if that's sufficient. That checks if there is are any handlers/listeners attached to the current event logger, meaning if I add a listener to another schema or another event type, then a Contents Manager event still performs a deep copy, validation, and building an event capsule only to do nothing with it.

Of course, this is out of scope for the PR. I'll make an issue of this in Jupyter Events for further discussion.

Ah, I think we actually are in agreement; I forgot Jupyter maintains a distinction between developers and operators, and I was using the word "developer" to mean both. Yes, we should definitely be conscious of performance when building this, though I'm sure it's quite difficult to measure.

I think the main point is that Jupyter Server should reach a future state where it provides high volumes of fine-grained, structured event data from most areas across Jupyter Server, allowing operators to get detailed information about what's happening in a running server. In many deployment scenarios, it's required that operators keep a detailed log of everything happening in the application.

If performance is the concern, the answer isn't to avoid adding events; it's to improve performance. I think we can do it! 😎

I took a look at that logic, and I'm not sure if that's sufficient. That checks if there is are any handlers/listeners attached to the current event logger, meaning if I add a listener to another schema or another event type, then a Contents Manager event still performs a deep copy, validation, and building an event capsule only to do nothing with it.

Yeah, that's right. I think we should update jupyter_events to check if the specific event is being watched by any listeners.

This is a little trickier (not impossible) to do with handlers, since we don't keep a mapping of handlers to their specific events (consequence of using Python's logging libraries for the main EventLogger). All handlers listen to all events, unless someone adds a logging.Filter object to look for specific events. The challenge is that we can't inspect the filter objects easily. I have some ideas on how we can solve this issue in jupyter_events, but this shouldn't block the PR here.

jupyter_server/event_schemas/contents_service/v1.yaml

dlqqq · 2022-08-29T20:33:09Z

@Zsailer Completed my review. 👀

Zsailer · 2022-08-30T00:05:11Z

Thank you for the review @dlqqq! I appreciate it!

It's probably worth doing some benchmarking tests to see how much event emission affects the server's performance. I'll try to find time to build these tests later this week.

dlqqq

Looks good so far. Great work! Not sure if you wanted to tackle performance before/after merging this. LMK your thoughts.

Zsailer · 2022-08-30T19:55:19Z

Looks good so far. Great work! Not sure if you wanted to tackle performance before/after merging this. LMK your thoughts.

Thanks @dlqqq! I think all of the performance issues can/should be solved in the jupyter_events library itself. I think we can merge this PR here separately from those changes and just bump the version of jupyter_events here when it's ready.

emit events from the contents manager

88b4da1

Zsailer added the enhancement label Aug 29, 2022

Zsailer added this to the 2.0 milestone Aug 29, 2022

remove unused imports

607a052

Zsailer mentioned this pull request Aug 29, 2022

[Telemetry] Add basic telemetry to Jupyter server and begin emitting server events #364

Closed

return default event_logger in contents manager

f0ba8db

dlqqq reviewed Aug 29, 2022

View reviewed changes

jupyter_server/serverapp.py Show resolved Hide resolved

dlqqq reviewed Aug 29, 2022

View reviewed changes

jupyter_server/event_schemas/contents_service/v1.yaml Outdated Show resolved Hide resolved

remove unused actions

8890717

dlqqq approved these changes Aug 30, 2022

View reviewed changes

Zsailer marked this pull request as ready for review August 31, 2022 20:21

Zsailer merged commit 6dc7d53 into jupyter-server:main Aug 31, 2022

Zsailer mentioned this pull request Sep 1, 2022

An event system for Jupyter #780

Open

Zsailer deleted the contents-service-events branch January 16, 2024 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit events from the Contents Service #954

Emit events from the Contents Service #954

Zsailer commented Aug 29, 2022

Zsailer commented Aug 29, 2022

Zsailer commented Aug 29, 2022

codecov-commenter commented Aug 29, 2022 •

edited

Loading

dlqqq commented Aug 29, 2022

dlqqq Aug 29, 2022 •

edited

Loading

Zsailer Aug 29, 2022

dlqqq Aug 30, 2022

Zsailer Aug 30, 2022

dlqqq commented Aug 29, 2022

Zsailer commented Aug 30, 2022

dlqqq left a comment

Zsailer commented Aug 30, 2022

Emit events from the Contents Service #954

Emit events from the Contents Service #954

Conversation

Zsailer commented Aug 29, 2022

Zsailer commented Aug 29, 2022

Zsailer commented Aug 29, 2022

codecov-commenter commented Aug 29, 2022 • edited Loading

Codecov Report

dlqqq commented Aug 29, 2022

dlqqq Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

Zsailer Aug 29, 2022

Choose a reason for hiding this comment

dlqqq Aug 30, 2022

Choose a reason for hiding this comment

Zsailer Aug 30, 2022

Choose a reason for hiding this comment

dlqqq commented Aug 29, 2022

Zsailer commented Aug 30, 2022

dlqqq left a comment

Choose a reason for hiding this comment

Zsailer commented Aug 30, 2022

codecov-commenter commented Aug 29, 2022 •

edited

Loading

dlqqq Aug 29, 2022 •

edited

Loading