-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmodified queries should not refresh if the underlying datasource has not changed. #362
Comments
@mconnormoz How would we know if the underlying datasource hasn't changed without running a query? |
The least complicated way, I'd imagine, would be maintaining a small
database that gets updated with the most recent update timestamp (set at
the end of the various aggregation jobs). We already know when the query
was last run. Treat it similar to HTTP If-Modified-Since and only run a
query if the last result was before that
timestamp.
…On Sun, Jun 17, 2018, 6:05 PM Alison, ***@***.***> wrote:
@mconnormoz <https://github.com/mconnormoz> How would we know if the
underlying datasource hasn't changed without running a query?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#362 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBDR1FusRvUlw4bM-z5sNS-x4J62azpks5t9tKhgaJpZM4TJI0g>
.
|
@mconnormoz I'm trying to understand if this should be moved into a ticket in the upstream project or if this is a purely Mozilla specific feature. Part of that is understanding which data sources we're talking about specifically. E.g. Redash supports a lot of different type of query runners/data sources, so assuming we can implement this generically it'd go upstream, or if this is only possible for the data sources where we run aggregation jobs for. Could elaborate what you mean with the "set at the end of the various aggregation jobs"? |
There's another way to enable this feature - use the re:dash API, and create an Airflow operator that runs queries after the ETL job has run. Then the query never reruns unless the underlying data has changed. IMHO it's a big ask to update some outside database when ETL jobs are finished, and then have re:dash check that database to see if the underlying data has changed from a previous run (but it makes sense). |
FTR, the Redash API isn't officially supported for clients other than the Redash frontend (even though there are experiments of a Python library to access part of the API). IOW your plan would require some concerted effort to continue that work and also have upstream accept it as an alternative client (which brings up questions about API stability etc). Something to be doubly-sure for production queries in Airflow. |
@jezdez very good points, it would require a stable API and a supported client. I was under the assumption that the API was stable, since we've used it across |
@fbertsch The API of the upstream project is stable for the Redash frontend only since they are shipped together, but it's not guaranteed or versioned (e.g. Our fork doesn't diverge from that and we don't guarantee the stability of it because of it. Hence, redash-client and stmocli inherit this api instability, which is a technical debt we'll need to pay eventually. Hopefully until then the situation in Redash upstream has improved, but I think it'll require initiative from our side to do it (interest is there upstream). |
Related to #187 but on the refresh side. If we know the underlying datasource hasn't changed, and the query hasn't changed, we should make this an effective no-op on the DB side. This would apply on all non-edit views, especially on dashboards.
If the cached result is newer than the currently displayed view, we should fetch that result/graph instead (similar to reloading the page), but without hitting the DB.
cc @fbertsch
The text was updated successfully, but these errors were encountered: