Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmodified queries should not refresh if the underlying datasource has not changed. #362

Open
mconnormoz opened this issue Apr 5, 2018 · 7 comments
Milestone

Comments

@mconnormoz
Copy link

Related to #187 but on the refresh side. If we know the underlying datasource hasn't changed, and the query hasn't changed, we should make this an effective no-op on the DB side. This would apply on all non-edit views, especially on dashboards.

If the cached result is newer than the currently displayed view, we should fetch that result/graph instead (similar to reloading the page), but without hitting the DB.

cc @fbertsch

@rafrombrc rafrombrc added this to the 15 milestone May 30, 2018
@alison985
Copy link

@mconnormoz How would we know if the underlying datasource hasn't changed without running a query?

@mconnormoz
Copy link
Author

mconnormoz commented Jun 17, 2018 via email

@rafrombrc rafrombrc modified the milestones: 15, 16 Jul 11, 2018
@jezdez
Copy link

jezdez commented Jul 19, 2018

@mconnormoz I'm trying to understand if this should be moved into a ticket in the upstream project or if this is a purely Mozilla specific feature. Part of that is understanding which data sources we're talking about specifically.

E.g. Redash supports a lot of different type of query runners/data sources, so assuming we can implement this generically it'd go upstream, or if this is only possible for the data sources where we run aggregation jobs for.

Could elaborate what you mean with the "set at the end of the various aggregation jobs"?

@fbertsch
Copy link

There's another way to enable this feature - use the re:dash API, and create an Airflow operator that runs queries after the ETL job has run. Then the query never reruns unless the underlying data has changed.

IMHO it's a big ask to update some outside database when ETL jobs are finished, and then have re:dash check that database to see if the underlying data has changed from a previous run (but it makes sense).

@jezdez
Copy link

jezdez commented Jul 19, 2018

FTR, the Redash API isn't officially supported for clients other than the Redash frontend (even though there are experiments of a Python library to access part of the API). IOW your plan would require some concerted effort to continue that work and also have upstream accept it as an alternative client (which brings up questions about API stability etc). Something to be doubly-sure for production queries in Airflow.

@fbertsch
Copy link

@jezdez very good points, it would require a stable API and a supported client. I was under the assumption that the API was stable, since we've used it across redash_client and stmocli.

@jezdez
Copy link

jezdez commented Jul 20, 2018

@fbertsch The API of the upstream project is stable for the Redash frontend only since they are shipped together, but it's not guaranteed or versioned (e.g. /api/v1/..) for other uses (or at least not fully documented). The current docs are early and only tentatively informing about the API.

Our fork doesn't diverge from that and we don't guarantee the stability of it because of it. Hence, redash-client and stmocli inherit this api instability, which is a technical debt we'll need to pay eventually. Hopefully until then the situation in Redash upstream has improved, but I think it'll require initiative from our side to do it (interest is there upstream).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants