-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Scheduling queries from SQL Lab #7416
Conversation
clicked, a modal will show up where the user can add the metadata required for | ||
scheduling the query. | ||
|
||
This information can then be retrieved from the endpoint `/savedqueryviewapi/api/read` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
point of curiosity: Is this URL something we have control over? (I don't see it specified anywhere, so I assume generated from a base class). It feels like a more idiomatic "RESTful" URL for a platform-y backend would be GET /api/savedqueries/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's automatically generated by FAB. There's a view called SavedQueryViewApi
, and FAB automatically adds the .../api/*
endpoints for interacting with it.
Codecov Report
@@ Coverage Diff @@
## lyft-release-sp8 #7416 +/- ##
====================================================
- Coverage 65% 64.93% -0.07%
====================================================
Files 429 430 +1
Lines 21081 21114 +33
Branches 2337 2340 +3
====================================================
+ Hits 13704 13711 +7
- Misses 7253 7279 +26
Partials 124 124
Continue to review full report at Codecov.
|
I'm currently writing the SIP for this. Edit: #7425 |
onSchedule: PropTypes.func, | ||
}; | ||
const defaultProps = { | ||
defaultLabel: t('Undefined'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we provide default values for the other props?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. I'll mark the other ones as required (sql, schema, dbid).
👍 |
Merge commit '90eef51' from lyft-release-sp8
* Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * Incorporate feedback from initial PR (prematurely merged to lyft-release-sp8) (#7415) * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * Workaround for no results returned (#7442) * feat: view presto row objects in data grid (#7436) * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * feat: Scheduling queries from SQL Lab (#7416) * Lightweight pipelines POC * Add docs * Minor fixes * Remove Lyft URL * Use enum * Minor fix * Fix unit tests * Mark props as required
Hello! I'm working at M6 (french TV group) and we're working on a job that look a bit like scheduled queries, so I was wondering if we could exchange words on how things are planned in SuperSet and see if we could help each other. Our job is an "Aggregator" which basically run aggregation queries on daily basis. We structured it similarly you did with scheduled queries: a config file (currently hand made), an Airflow dag which create other dags based on config files, and a Scala/Spark job to run the query (and monitor number of lines to Grafana). You talked about Airflow integration, how did you managed that part at Lyft? Our need seems more specific but I think we can help. On the config part, there is some info missing for us (full/incremental query, partitions of input tables to put sensors on it, partitions of output table), is this part "adjustable"? Thanks for your time, I hope we'll can help and share our work! Have a nice day, |
Hi, @vnourdin! On the Superset side, I tried to make this as agnostic as possible. The example config is meant for Airflow, but by changing the config you can add any metadata you need to successfully schedule a query. At Lyft we're prototyping this with Hive, and later we'll add support to Presto. We're also planning to add an option to upload the data to our Druid database. My co-worker @ArgentFalcon wrote the Airflow pipeline, and I'll see if he can share it with you. For our proof-of-concept we expect from the user a query filtered by We also support (or will support) depending on a daily partition of a Hive table or on another Airflow DAG. The way we do that is by adding dependencies (see the modal) like It should be easy to support different workflows. You could add a checkbox asking the user if this is a full or incremental query, for example, and the Airflow DAG would work differently depending on the value. Hope this helps! |
Curious about how you handle different schedule intervals. Since Airflow supports only one interval per DAG, I'm assuming you generate different DAGs for each schedule_interval? Say one for I'd be cool to see the code used for the DAG-generation for reference, especially around the URI-processing into deps, though I'm guessing there are some Lyft-specific operators in the mix. |
Thanks, that's helping. We're really interested in exchanging with @ArgentFalcon about the AirFlow part! As our need urge a bit, we'll finish it the way we began, but we aim to switch later and integrate our work with SuperSet scheduled queries. We also have a |
* filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * Added living goods as among the users of Superset (#7407) * Added living goods as among the users of Superset Living Goods is a non profit organisation with operation in africa and the middle east. We work in community health use data heavily on day to day. Superset is our platform of choice for dashboards. * Update README.md * [dashboard] allow user re-order top-level tabs (#7390) * [SQL Lab] Increase timeout threshold for offline check (#7411) * Bump FAB to 2.0.0 (#7323) * Bump FAB to 2.0.0 * [tests] whitelist SecurityApi login and refresh endpoints * [style] Fix, C812 missing trailing commas * [security] Remove SUPERSET_UPDATE_PERMS flag Registering sources needs to be performed after the views are initialized on UPDATE_PERMS=False configuration * [docs] New, FAB_UPDATE_PERMS and flask fab cli * [docs] Fix, db upgrade needs to come first, create-admin needs a db * [cli] New, superset init bootstraps all permissions for FAB and Superset * [style] Fix, flakes * [annotations] Improves UX on annotation validation, start_dttm, end_dttm (#7326) * Setting renderTrigger on label_colors (#7410) * Refactor out controlUtils.js module + unit tests (#7350) * [WiP]refactor out a controlUtils.js file * unit tests * add missing license * Addressing comments * feature: see Presto row and array data types (#7413) * Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print * Removed --console-log and superset runserver (#7421) * Fixes dashboard export button missing download and #7353 (#7427) * Added additional German translations to string file (#6604) * Added additional German translations to string file Updates to German translation files as per directions * Removed messages.json * [fix] Fixing SQL parsing issue (#7374) * add chinese translate (#7402) * Quick fix to address deadlock issue (#7434) * feat: view presto row objects in data grid (#7445) * Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * Incorporate feedback from initial PR (prematurely merged to lyft-release-sp8) (#7415) * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * feat: Scheduling queries from SQL Lab (#7416) (#7446) * Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * Incorporate feedback from initial PR (prematurely merged to lyft-release-sp8) (#7415) * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * Workaround for no results returned (#7442) * feat: view presto row objects in data grid (#7436) * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * feat: Scheduling queries from SQL Lab (#7416) * Lightweight pipelines POC * Add docs * Minor fixes * Remove Lyft URL * Use enum * Minor fix * Fix unit tests * Mark props as required * feat: Add `validate_sql_json` endpoint for checking that a given sql query is valid for the chosen database (#7422) (#7462) merge from lyft-release-sp8 to master * Adds missing metric sum__SP_RUR_TOTL (#7452) * Late import for optional lib pyhive (#7471) * Late import for optional lib pyhive * fix * fix: calendar heatmap examples (#7375) Fixing a set of examples that trip on ValueError vs TypeError * bugfix: Improve support for special characters in schema and table names (#7297) * Bugfix to SQL Lab to support tables and schemas with characters that require quoting * Remove debugging prints * Add uri encoding to secondary tables call * Quote schema names for presto * Quote selected_schema on Snowflake, MySQL and Hive * Remove redundant parens * Add python unit tests * Add js unit test * Fix flake8 linting error * [dashboard] After update filter, trigger new queries when charts are visible (#7233) * trigger query when chart is visible * add integration test * fix: alter sql columns to long text #7463 (#7476) Merge lyft-release-sp8@7bfe7bc to master * Refactor ConsoleLog (#7428) * Revised Chinese translation (#7464) * add chinese translate * edit chinese translation * druid connector: avoid using 'dimensions' for scan queries (#7377) After the following PyDruid change (contained in version 0.5.2) the Superset Histogram charts rendered with Druid data are broken: druid-io/pydruid@0a59a70 Bump the pydruid requirements accordingly in setup.py Issue: #7368
@betodealmeida is your co-worker still planning on sharing his Airflow pipeline? We're also working on scheduling queries from Superset and using Airflow to schedule them. So I am really interested in how you guys built your Airflow pipeline :) |
I just pinged @ArgentFalcon on Slack |
Thanks for you patience, I'm trying to figure out how to paste the information in this DAG without getting it too bogged down by Lyft specific stuff. |
Thanks @ArgentFalcon. I think the whole DAG is interesting for us. But the most interesting parts would be how you guys check the re-scheduling of different queries (do you guys dynamically create a DAG for every query returned by the API and use that scheduling data to create a DAG) and how you are planning to handle different databases in the future? But if you figure out how to paste most of the information from this DAG here, that would be great :) |
@ArgentFalcon thanks for your contribution, did you manage to fix the issues with your DAG? |
Hey, @aurelsaillant Thanks for you patience. We did manage to fix the issue. I'm currently double-checking that I can release the code snippet just so I don't get in trouble. |
Hello @ArgentFalcon, how did your checking go? Thanks again for your contribution |
@ArgentFalcon friendly 🏓 |
|
||
.. code-block:: python | ||
|
||
FEATURE_FLAGS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bumped into an error today where the SQL Lab page fails to render because somehow I set FEATURE_FLAGS['SCHEDULED_QUERIES']
to True
in my local configs.
It is a surprise to me that complex objects are added into "flags". I thought it's only about on/off toggles. This is quite confusing and would make it difficult to extend the FEATURE_FLAGS
feature structurally (say, moving the management of it to the database or an external API).
Can we maybe introduce a new config value SCHEDULED_QUERIES_SCHEMA
and keep FEATURE_FLAGS
simple?
SUMMARY
This PR introduces a lightweight way of scheduling queries in SQL Lab. If the feature flag
SCHEDULED_QUERIES
is enabled with proper configuration, a button called "Schedule Query" will show up in SQL Lab. The button allows queries to be saved with extra metadata that allows an external scheduler to run it periodically by polling the/savedqueryviewapi/api/read
endpoint.The sample configuration can be changed or expanded to support different metadata needed, depending on the scheduler. We tested it with Apache Airflow at Lyft successfully.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
This will generate the following payload in
savedqueryviewapi/api/read
:TEST PLAN
Tested end-to-end at Lyft with a Hive query.
ADDITIONAL INFORMATION
REVIEWERS
@mistercrunch @datability-io @DiggidyDave @khtruong @ArgentFalcon