-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to estimate query cost before executing it #8172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bunch of comments, also @john-bodley or @villebro should probably double check the db engine specs stuff
superset/migrations/versions/8786d6374caa_add_column_for_query_estimate.py
Outdated
Show resolved
Hide resolved
@etr2460, I addressed all the comments. I removed the DB migration, and the feature is enabled per DB in extras. Also, I use the DB version to determine if it's supported. |
|
||
prefixes = ["K", "M", "G", "T", "P", "E", "Z", "Y"] | ||
prefix = "" | ||
to_next_prefix = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commenting again, shouldn't this be 1024? And we should make it a const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I replied to your comment but I think I resolved the conversation. This is used not just for bytes, but also for cpu and network cost, so 1000 is the correct unit. Also, 1000 is the correct unit for the prefixes K, M, G, etc. For 1024 the prefixes are Ki, Mi, Gi.
Eg, 1024 B = 1 KiB = 1.024 KB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same concern as @etr2460 , learned something new here (had to google to double check) 👍
superset/db_engine_specs/presto.py
Outdated
|
||
db_engine_spec.execute(cursor, sql) | ||
polled = cursor.poll() | ||
while polled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only way to tell if the query is finished? This seems a little sketchy, can we not pass a callback or something on success?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let me simplify this.
superset/db_engine_specs/presto.py
Outdated
result = json.loads(first) | ||
estimate = result["estimate"] | ||
|
||
def humanize(value, suffix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add types here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
superset/models/core.py
Outdated
@@ -773,6 +773,13 @@ def name(self): | |||
def allows_subquery(self): | |||
return self.db_engine_spec.allows_subqueries | |||
|
|||
@property | |||
def allows_cost_estimate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a return type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
superset/views/core.py
Outdated
@expose("/estimate_query_cost/<database_id>/", methods=["POST"]) | ||
@expose("/estimate_query_cost/<database_id>/<schema>/", methods=["POST"]) | ||
@event_logger.log_this | ||
def estimate_query_cost(self, database_id, schema=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
superset/db_engine_specs/base.py
Outdated
@@ -148,6 +149,10 @@ class BaseEngineSpec: | |||
max_column_name_length = 0 | |||
try_remove_schema_from_table_name = True | |||
|
|||
@classmethod | |||
def get_allow_cost_estimate(cls, version=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@etr2460, I added types and cleaned up the query execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind making the test plan a little more robust? Test with the feature flag both enabled and disabled, with presto dbs that are configured at a passing version and prior version? With non presto dbs? I'm sure you've tested other cases, but right now the test plan only references the happy path, so a bit more detail would be great.
other than that and my 2 comments here, this lgtm! I'll approve to unblock
CATEGORY
Choose one
SUMMARY
We currently added to Presto support for estimating the number of bytes scanned (trinodb/trino#806), and we'd like to surface that information to SQL Lab users before they actually run a query.
This PR extends the DB specs with an
allows_cost_estimate
attribute and associated methods, allowing pre-execution costs to be computed from DBs that support it.In order to use it, the feature flag
ESTIMATE_QUERY_COST
must be enabled, and it needs to be explicitly turned on for each database that supports query cost estimation. When all those conditions are met, a new button will show up in SQL Lab, allowing users to run cost estimates for the whole query or for the selected SQL.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
DBs where the feature is not supported or not enabled are unmodified:
Here's a Presto DB:
Waiting for results:
The result:
And how errors (timeout, syntax errors) are surfaced:
TEST PLAN
Tested with a Presto cluster that supports query cost estimation, running version 0.319 and with the feature enabled via extra:
Additionally, I tested:
ADDITIONAL INFORMATION
REVIEWERS
@etr2460 @mistercrunch