-
-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traces should include SQL executed by subtasks created with asyncio.gather
#1576
Comments
Here's the full current relevant code from Lines 8 to 64 in ace8656
|
One way to solve this would be to introduce a It would be really nice if I could solve this using with trace_child_tasks():
(
suggested_facets,
(facet_results, facets_timed_out),
) = await asyncio.gather(
execute_suggested_facets(),
execute_facets(),
) |
This article is relevant: Context information storage for asyncio - in particular the section https://blog.sqreen.com/asyncio/#context-inheritance-between-tasks which describes exactly the problem I have and their solution, which involves this trickery: def request_task_factory(loop, coro):
child_task = asyncio.tasks.Task(coro, loop=loop)
parent_task = asyncio.Task.current_task(loop=loop)
current_request = getattr(parent_task, 'current_request', None)
setattr(child_task, 'current_request', current_request)
return child_task
loop = asyncio.get_event_loop()
loop.set_task_factory(request_task_factory) They released their solution as a library: https://pypi.org/project/aiocontext/ and https://github.com/sqreen/AioContext - but that company was acquired by Datadog back in April and doesn't seem to be actively maintaining their open source stuff any more: https://twitter.com/SqreenIO/status/1384906075506364417 |
Python 3.6 support ends in a few days time, and it looks like Glitch has updated to 3.7 now - so maybe I can get away with Datasette needing 3.7 these days? Tweeted about that here: https://twitter.com/simonw/status/1473761478155010048 |
Another option: https://github.com/Skyscanner/aiotask-context - looks like it might be better as it's been updated for Python 3.7 in this commit Skyscanner/aiotask-context@67108c9 The Skyscanner one doesn't attempt to wrap any existing factories, but that's OK for my purposes since I don't need to handle arbitrary |
It's tiny: I'm tempted to vendor it. https://github.com/Skyscanner/aiotask-context/blob/master/aiotask_context/__init__.py No, I'll add it as a pinned dependency, which I can then drop when I drop 3.6 support. |
Another option would be to attempt to import |
I dropped support for Python 3.6 in fae3983 so now free to use |
Here's what I was hacking around with when I uncovered this problem: diff --git a/datasette/views/table.py b/datasette/views/table.py
index 77fb285..8c57d08 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -1,3 +1,4 @@
+import asyncio
import urllib
import itertools
import json
@@ -615,44 +616,37 @@ class TableView(RowTableShared):
if request.args.get("_timelimit"):
extra_args["custom_time_limit"] = int(request.args.get("_timelimit"))
- # Execute the main query!
- results = await db.execute(sql, params, truncate=True, **extra_args)
-
- # Calculate the total count for this query
- filtered_table_rows_count = None
- if (
- not db.is_mutable
- and self.ds.inspect_data
- and count_sql == f"select count(*) from {table} "
- ):
- # We can use a previously cached table row count
- try:
- filtered_table_rows_count = self.ds.inspect_data[database]["tables"][
- table
- ]["count"]
- except KeyError:
- pass
-
- # Otherwise run a select count(*) ...
- if count_sql and filtered_table_rows_count is None and not nocount:
- try:
- count_rows = list(await db.execute(count_sql, from_sql_params))
- filtered_table_rows_count = count_rows[0][0]
- except QueryInterrupted:
- pass
-
- # Faceting
- if not self.ds.setting("allow_facet") and any(
- arg.startswith("_facet") for arg in request.args
- ):
- raise BadRequest("_facet= is not allowed")
+ async def execute_count():
+ # Calculate the total count for this query
+ filtered_table_rows_count = None
+ if (
+ not db.is_mutable
+ and self.ds.inspect_data
+ and count_sql == f"select count(*) from {table} "
+ ):
+ # We can use a previously cached table row count
+ try:
+ filtered_table_rows_count = self.ds.inspect_data[database][
+ "tables"
+ ][table]["count"]
+ except KeyError:
+ pass
+
+ if count_sql and filtered_table_rows_count is None and not nocount:
+ try:
+ count_rows = list(await db.execute(count_sql, from_sql_params))
+ filtered_table_rows_count = count_rows[0][0]
+ except QueryInterrupted:
+ pass
+
+ return filtered_table_rows_count
+
+ filtered_table_rows_count = await execute_count()
# pylint: disable=no-member
facet_classes = list(
itertools.chain.from_iterable(pm.hook.register_facet_classes())
)
- facet_results = {}
- facets_timed_out = []
facet_instances = []
for klass in facet_classes:
facet_instances.append(
@@ -668,33 +662,58 @@ class TableView(RowTableShared):
)
)
- if not nofacet:
- for facet in facet_instances:
- (
- instance_facet_results,
- instance_facets_timed_out,
- ) = await facet.facet_results()
- for facet_info in instance_facet_results:
- base_key = facet_info["name"]
- key = base_key
- i = 1
- while key in facet_results:
- i += 1
- key = f"{base_key}_{i}"
- facet_results[key] = facet_info
- facets_timed_out.extend(instance_facets_timed_out)
-
- # Calculate suggested facets
- suggested_facets = []
- if (
- self.ds.setting("suggest_facets")
- and self.ds.setting("allow_facet")
- and not _next
- and not nofacet
- and not nosuggest
- ):
- for facet in facet_instances:
- suggested_facets.extend(await facet.suggest())
+ async def execute_suggested_facets():
+ # Calculate suggested facets
+ suggested_facets = []
+ if (
+ self.ds.setting("suggest_facets")
+ and self.ds.setting("allow_facet")
+ and not _next
+ and not nofacet
+ and not nosuggest
+ ):
+ for facet in facet_instances:
+ suggested_facets.extend(await facet.suggest())
+ return suggested_facets
+
+ async def execute_facets():
+ facet_results = {}
+ facets_timed_out = []
+ if not self.ds.setting("allow_facet") and any(
+ arg.startswith("_facet") for arg in request.args
+ ):
+ raise BadRequest("_facet= is not allowed")
+
+ if not nofacet:
+ for facet in facet_instances:
+ (
+ instance_facet_results,
+ instance_facets_timed_out,
+ ) = await facet.facet_results()
+ for facet_info in instance_facet_results:
+ base_key = facet_info["name"]
+ key = base_key
+ i = 1
+ while key in facet_results:
+ i += 1
+ key = f"{base_key}_{i}"
+ facet_results[key] = facet_info
+ facets_timed_out.extend(instance_facets_timed_out)
+
+ return facet_results, facets_timed_out
+
+ # Execute the main query, facets and facet suggestions in parallel:
+ (
+ results,
+ suggested_facets,
+ (facet_results, facets_timed_out),
+ ) = await asyncio.gather(
+ db.execute(sql, params, truncate=True, **extra_args),
+ execute_suggested_facets(),
+ execute_facets(),
+ )
+
+ results = await db.execute(sql, params, truncate=True, **extra_args)
# Figure out columns and rows for the query
columns = [r[0] for r in results.description] It's a hacky attempt at running some of the table page queries in parallel to see what happens. |
Needs documentation. I'll document |
New documentation section: https://docs.datasette.io/en/latest/internals.html#datasette-tracer |
I tried running some parallel SQL queries using
asyncio.gather()
but the SQL that was executed didn't show up in the trace rendered by https://datasette.io/plugins/datasette-pretty-tracesI realized that was because traces are keyed against the current task ID, which changes when a sub-task is run using
asyncio.gather
or similar.The faceting and suggest faceting queries are missing from this trace:
Originally posted by @simonw in #1518 (comment)
The text was updated successfully, but these errors were encountered: