-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(db_engine_specs): big query cost estimation #21325
feat(db_engine_specs): big query cost estimation #21325
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️
We hope to see you in our Slack community too!
Thanks for running the tests. I'll check on the lint errors and try to see why the cypress test is showing the "cant create chart from query error. Maybe the estimate cost button is changing the selector of the other button? Lets see. |
Codecov Report
@@ Coverage Diff @@
## master #21325 +/- ##
==========================================
- Coverage 67.03% 66.99% -0.04%
==========================================
Files 1859 1859
Lines 71043 71090 +47
Branches 7776 7776
==========================================
+ Hits 47622 47630 +8
- Misses 21397 21436 +39
Partials 2024 2024
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@villebro is it okay like this? The cypress E2E tests sometimes work sometimes don't, but everything else is fixed and working. |
Do I need to do something else to merge this? |
Wowwww!!! This is a so, so much needed feature!! |
@zamar-roura would you be able to rebase the PR to resolve the conflicts? |
f855f45
to
58705fa
Compare
@villebro Tell me if you need anything else. I'll be The reducer/SqlLab.js needed some fixes in the ESTIMATE QUERY action as action.query didn't have any sqlEditorId parameter. It always comes as action.query.id not action.query.sqlEditorId. |
Thanks @zamar-roura - I'll do a review + test pass tomorrow 👍 |
Hi @villebro, I'm at your full disposal to improve the merge as much as needed, it's my first merge in superset, if anything is not as it should be tell me and I'll put my full effort into it. |
@zamar-roura I haven't forgotten about this, I've just been insanely busy. But I'll do my absolute best to review this today! |
@villebro I totally understand, thank you so much for the quick reply! |
Restarted CI, the cypress workflow seems to be having issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution and for fixing this, as this feature appears to be broken right now 🙁 Added a few comments after my first pass review and test.
superset/config.py
Outdated
# The feature is off by default, and currently only supported in Presto and Postgres, | ||
# and Bigquery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current config, docs and the examples in config.py
are in a pretty horrible state right now. I observed the following:
- The
ESTIMATE_QUERY_COST
config is in fact a feature flag and needs to be moved toDEFAULT_FEATURE_FLAGS
:Line 879 in 49f1cfc
ESTIMATE_QUERY_COST = False - This example is broken:
Line 905 in 49f1cfc
# "QUERY_COST_FORMATTERS_BY_ENGINE": {"postgresql": postgres_query_cost_formatter}, # QUERY_COST_FORMATTERS_BY_ENGINE = {"postgresql": postgres_query_cost_formatter}
. This should also be updated. - There are no docs for this feature. In the deprecated docs there's a SQL Lab section that seems to have been lost over the years, and the content is also incorrect: https://apache-superset.readthedocs.io/en/latest/sqllab.html I don't expect you to add docs for this, but just mentioning this here to call attention to it. FYI @rusackas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the callout... putting it on my very long to-do list!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the things and added some more comments.
About the docs I would love to fix the content. Only thing is that the superset/blob/master/docs/sqllab.rst file doesnt even exist now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zamar-roura let's leave the docs for a follow-up PR 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few last minute comments/questions
superset/db_engine_specs/bigquery.py
Outdated
# Format Bytes. | ||
# TODO: In case more db engine specs need to be added, | ||
# this should be made a function outside this scope. | ||
byte_division = 1024 | ||
if hasattr(query_job, "total_bytes_processed"): | ||
query_bytes_processed = query_job.total_bytes_processed | ||
if query_bytes_processed // byte_division == 0: | ||
byte_type = "B" | ||
total_bytes_processed = query_bytes_processed | ||
elif query_bytes_processed // (byte_division**2) == 0: | ||
byte_type = "KB" | ||
total_bytes_processed = round(query_bytes_processed / byte_division, 2) | ||
elif query_bytes_processed // (byte_division**3) == 0: | ||
byte_type = "MB" | ||
total_bytes_processed = round( | ||
query_bytes_processed / (byte_division**2), 2 | ||
) | ||
else: | ||
byte_type = "GB" | ||
total_bytes_processed = round( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a future cleanup PR, we should really harmonize thees across specs (I seem to recall someone already did this, but I was unable to find it..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, after pushing this I saw that there was an early PR of 2022 that did humanize it #18694
Hi @villebro. I don't remember if there was anything else we needed to do. Its a busy end of the year and also full of festivities so if you want we can see what´s needed after all this ends, cheers! |
@villebro I'll be checking from time to time to resolve merge conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your patience with this PR. Also, thanks for bringing up #18694 which I appear to have dropped the ball on 🙁
SUMMARY
We are adding four methods inside superset/db_engine_specs/bigquery.py
For now only Postgres and Presto had cost estimation capabilities. For that they used the cursor object.
Bigquery needs to use a dry run to get the query cost estimation. Sadly, you can't use the cursor to execute dry runs (I have tried). And that's why there is a need to override the two functions. estimate_statement_cost and estimate_query_cost. The other two methods get_allow_cost_estimate and query_cost_formatter are just added because the base class doesn't have functionality or gives a False.
Also the QueryEditor.py object doesn't have sqlEditorId anymore in it's variables and the Cost Estimation Reducer is looking for it, thus giving error, that is also changed.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
BEFORE
After
TESTING INSTRUCTIONS
Add ESTIMATE_QUERY_COST = True inside superset/config.py.
Add a valid BigQuery database, activate option Enable query cost estimation inside advanced options.
Go to SQL Lab and put a valid query, then press Estimate Cost button.
ADDITIONAL INFORMATION