Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

track_event() mechanism for analytics #2240

Closed
Tracked by #2251
simonw opened this issue Jan 31, 2024 · 8 comments
Closed
Tracked by #2251

track_event() mechanism for analytics #2240

simonw opened this issue Jan 31, 2024 · 8 comments

Comments

@simonw
Copy link
Owner

simonw commented Jan 31, 2024

For Datasette Cloud I need analytics that help me understand which features people are using.

I also want to provide a more detailed audit log for my customers, so they can keep track of what's been happening in their own instance.

I'm going to add a datasette.track_event(event_name, actor, properties) method which calls a plugin hook that can log or act on events.

Bonus: this means plugins that aren't tracking analytics can hook into the same system, e.g. a plugin that wants to run additional code any time someone creates a new table.

@simonw
Copy link
Owner Author

simonw commented Jan 31, 2024

I built a quick prototype of this like so:

diff --git a/datasette/app.py b/datasette/app.py
index 482cebb4..83a083b5 100644
--- a/datasette/app.py
+++ b/datasette/app.py
@@ -873,6 +873,15 @@ class Datasette:
         result = await await_me_maybe(result)
         return result
 
+    async def track_event(
+        self, name: str, actor: dict, properties: Optional[dict] = None
+    ):
+        properties = properties or {}
+        for hook in pm.hook.track_event(
+            datasette=self, name=name, actor=actor, properties=properties
+        ):
+            await await_me_maybe(hook)
+
     async def permission_allowed(
         self, actor, action, resource=None, default=DEFAULT_NOT_SET
     ):
diff --git a/datasette/hookspecs.py b/datasette/hookspecs.py
index 2f4c6027..6370d1e8 100644
--- a/datasette/hookspecs.py
+++ b/datasette/hookspecs.py
@@ -160,6 +160,11 @@ def handle_exception(datasette, request, exception):
     """Handle an uncaught exception. Can return a Response or None."""
 
 
+@hookspec
+def track_event(datasette, name, actor, properties):
+    """Respond to a named event tracked by Datasette"""
+
+
 @hookspec
 def top_homepage(datasette, request):
     """HTML to include at the top of the homepage"""
diff --git a/datasette/views/database.py b/datasette/views/database.py
index eac01ab6..23555ef2 100644
--- a/datasette/views/database.py
+++ b/datasette/views/database.py
@@ -969,6 +969,15 @@ class TableCreateView(BaseView):
         }
         if rows:
             details["row_count"] = len(rows)
+        await self.ds.track_event(
+            "table-create",
+            request.actor,
+            {
+                "database": db.name,
+                "table": table_name,
+                "schema": schema,
+            },
+        )
         return Response.json(details, status=201)
 
 
diff --git a/datasette/views/special.py b/datasette/views/special.py
index 849750bf..891c41f1 100644
--- a/datasette/views/special.py
+++ b/datasette/views/special.py
@@ -105,6 +105,7 @@ class LogoutView(BaseView):
         response = Response.redirect(self.ds.urls.instance())
         response.set_cookie("ds_actor", "", expires=0, max_age=0)
         self.ds.add_message(request, "You are now logged out", self.ds.WARNING)
+        await self.ds.track_event("logged-out", request.actor)
         return response

Here's an example plugin that uses it:

from datasette import hookimpl
import datetime
import json
import sys


@hookimpl
def track_event(name, actor, properties):
    msg = json.dumps(
        {
            "dt": datetime.datetime.utcnow().isoformat(),
            "name": name,
            "actor": actor,
            "properties": properties,
        }
    )
    print(msg, file=sys.stderr, flush=True)

I really like how simple this is. In the above patch I've instrumented two places:

  • A user logs out
  • A user creates a new table via the API

The instrumentation was very simple, but I think I want to formalize the way the properties work a bit more - maybe by defining them as dataclasses? Something that makes them easy to clearly document would be good.

@simonw
Copy link
Owner Author

simonw commented Jan 31, 2024

Using dataclasses ends up looking like this:

await self.ds.track_event(
CreateTableEvent(
request.actor, database=db.name, table=table_name, schema=schema
)
)

Class defined here:

@dataclass
class CreateTableEvent(Event):
name = "create-table"
database: str
table: str
schema: str

@simonw
Copy link
Owner Author

simonw commented Jan 31, 2024

A hopefully complete list of events in the Datasette core app that I want to track:

  • Login (using the /-/auth-token mechanism, I think that's the only way to login in default Datasette)
  • Logout
  • Create an API token
  • API stuff
    • Create a table
    • Insert rows (tricky due to not necessarily knowing how many rows were inserted)
    • Upsert rows
    • Delete rows
    • Drop a table
    • Update a row

I think that's everything for the core application. The vast majority of events will be tracked in plugins like datasette-upload-csvs and datasette-edit-schema and datasette-configure-fts and datasette-enrichments and suchlike.

@simonw
Copy link
Owner Author

simonw commented Jan 31, 2024

Question: if I'm using these dataclass based events, should they be passed through to the plugin hook consumers? At the moment I'm breaking them down into dictionaries and strings instead like this:

datasette/datasette/app.py

Lines 884 to 893 in f003284

async def track_event(self, event: Event):
assert isinstance(event, self.event_classes), "Invalid event type: {}".format(
type(event)
)
properties = dataclasses.asdict(event)
actor = properties.pop("actor")
for hook in pm.hook.track_event(
datasette=self, name=event.name, actor=actor, properties=properties
):
await await_me_maybe(hook)

I think passing the actual objects is probably better.

@simonw
Copy link
Owner Author

simonw commented Jan 31, 2024

I could generate the documentation for these by introspecting the code and using docstrings on the different classes. Maybe even present it as reference documentation generated from the code.

@simonw
Copy link
Owner Author

simonw commented Jan 31, 2024

I should add a datetime property to the Event base class, and ensure it gets populated correctly with a UTC value at the right moment - probably when the datasette.track_event(...) method is called.

@simonw
Copy link
Owner Author

simonw commented Jan 31, 2024

@simonw simonw added this to the Datasette 1.0a8 milestone Jan 31, 2024
@simonw
Copy link
Owner Author

simonw commented Jan 31, 2024

Updated example plugin after that change:

from datasette import hookimpl
import datetime
import json
import sys


@hookimpl
def track_event(event):
    name = event.name
    actor = event.actor
    properties = event.properties()
    msg = json.dumps(
        {
            "dt": datetime.datetime.utcnow().isoformat(),
            "name": name,
            "actor": actor,
            "properties": properties,
        }
    )
    print(msg, file=sys.stderr, flush=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant