Too easy to accidentally schedule queries for long periods that cost a lot of money #949

wlach · 2019-05-06T14:02:37Z

(filing this on mozilla's redash fork because the details are particular to us)

A couple months ago I did some brainstorming on how to reduce our Athena spend via redash queries:

https://docs.google.com/document/d/1fZDxl-BiB_OXu5NEEMrWmviNy2y3B8iu_MBQHycaAtQ/edit#heading=h.3kudzbqcx32n

As you probably saw, the iodide dashboard (internal only, sorry) + my nagging led to a substantial decrease in cost. But I think we should try to implement some of the suggested ideas in that doc as well, in particular @chutten suggestions about expiry seem valuable.

I think some combination of the following would be super helpful:

Warn about scheduling too often (> once per day)
Mandatory expiry of queries (with an email warning ala atmo) after some time interval, say 6 months, unless renewed.

There are other things we could also do (e.g. put a dollar figure in the query window, like GCP does) but I think this is a good place to start.

It seems like the latest version of redash already has some notion of query expiry judging from this issue:

getredash#3375

It doesn't seem like that's deployed though? I'm guessing this feature doesn't include the two suggestions above either?

/cc @rafrombrc @jezdez @washort

jezdez · 2019-05-09T10:44:27Z

So looking at getredash#3375 a bit, this relates to @emtwo's work on porting our feature to let the a query schedule end on a specific date:

As the issue states, there is no periodic Celery task yet to clear the schedule field so it becomes less expensive to look for queries to schedule, so this is mostly an optimizing step for reducing the load on the worker system.

What I believe you're describing needs to happen separately, I think both features can happen via our extension redash-stmo:

I think some combination of the following would be super helpful:
* Warn about scheduling too often (> once per day)

Warning about scheduling too often won't work without some code work, but I wonder if we could either:

a) simply remove all query options smaller than 24 hours (via the already existing REDASH_QUERY_REFRESH_INTERVALS env var)

b) add a periodic celery task via redash-stmo that would check the schedule for new queries and send emails to the query owners asking to update it?

* Mandatory expiry of queries (with an email warning ala atmo) after some time interval, say 6 months, unless renewed.

Sure, we could have the a periodic Celery task (via redash-stmo) check new queries and set the expiry and warn the queries authors about it (maybe in the same email about the schedule from the feature above)? I'm not sure if the particual part of the front-end is easy to extend with an actual in-place-warning sadly.

There are other things we could also do (e.g. put a dollar figure in the query window, like GCP does) but I think this is a good place to start.

It seems like the latest version of redash already has some notion of query expiry judging from this issue:

getredash#3375

It doesn't seem like that's deployed though? I'm guessing this feature doesn't include the two suggestions above either?

/cc @rafrombrc @jezdez @washort

wlach · 2019-05-09T15:02:42Z

As the issue states, there is no periodic Celery task yet to clear the schedule field so it becomes less expensive to look for queries to schedule, so this is mostly an optimizing step for reducing the load on the worker system.

Yeah i was just taking that issue as an indication that the feature had landed. Didn't know it was @emtwo who implemented it, that's cool. :) Maybe she would have some thoughts on this area as well...

What I believe you're describing needs to happen separately, I think both features can happen via our extension redash-stmo:

Warn about scheduling too often (> once per day)

Warning about scheduling too often won't work without some code work, but I wonder if we could either:

a) simply remove all query options smaller than 24 hours (via the already existing REDASH_QUERY_REFRESH_INTERVALS env var)

b) add a periodic celery task via redash-stmo that would check the schedule for new queries and send emails to the query owners asking to update it?

Unfortunately there are probably legitimate reasons why someone might want to schedule a query more than once a day.

Maybe we could add some text like "Please be mindful that queries can be expensive and slow down the system for others: please only schedule to update as often as you really need."

I could even see a case for making this text customizable per installation, perhaps via the extension mechanism? Can we also set a maximum value for the "expires" field and make it mandatory to expire? Perhaps via some configuration options?

Mandatory expiry of queries (with an email warning ala atmo) after some time interval, say 6 months, unless renewed.
Sure, we could have the a periodic Celery task (via redash-stmo) check new queries and set the expiry and warn the queries authors about it (maybe in the same email about the schedule from the feature above)? I'm not sure if the particual part of the front-end is easy to extend with an actual in-place-warning sadly.

Now that I think about it, some generic email when a query is about to expire would be all we would need, if we implemented the above.

emtwo · 2019-08-08T19:56:09Z

The simplest/quickest solution right now would be to flip the sort order of the string that contains schedule options in seconds. This string is stored in the environment variable REDASH_QUERY_REFRESH_INTERVALS

The default value is 60, 300, 600, 900, 1800, 3600, 7200, 10800, 14400, 18000, 21600, 25200, 28800, 32400, 36000, 39600, 43200, 86400, 604800, 1209600, 2592000

I updated it to 2419200, 1209600, 604800, 432000, 172800, 86400, 43200, 39600, 36000, 32400, 28800, 25200, 21600, 18000, 14400, 10800, 7200, 3600, 1800, 900, 600, 300, 60 and the UI now looks like this:

Note that a strict reverse sort without any changes resulted in days showing up before weeks still (since it had a 30-day option which was the largest). So I switched "30 days" to "4 weeks" instead and added a couple more options for days, as seen in the screenshot

@rafrombrc @jasonthomas If we're all ok with this change, this would be an envar change you can make on your end, right @jasonthomas?

jezdez · 2019-08-09T09:43:09Z

Wow, awesome @emtwo!

jasonthomas · 2019-08-09T14:20:48Z

this would be an envar change you can make on your end, right @jasonthomas?

This is what it looks like. I can add it if we are okay to proceed with this change.

wlach · 2019-08-09T16:49:56Z

Definitely a change in the right direction, though I still think that atmo-style query expiry is a good idea. It's still too easy to forget about a query after scheduling updates on it.

emtwo · 2019-08-09T16:59:01Z

@wlach Thanks for the reminder, I've opened a separate issue (#982) so we don't forget about it.

I suppose the question is whether we also still want to give a warning for frequently scheduled queries even with this new sort order...

rafrombrc assigned emtwo Aug 7, 2019

emtwo mentioned this issue Aug 9, 2019

Mandatory expiry of scheduled queries #982

Open

snyk-bot mentioned this issue Aug 17, 2021

[Snyk] Fix for 15 vulnerabilities MaxMood96/redash#8

Open

MaxMood96 mentioned this issue May 14, 2022

[Snyk] Fix for 1 vulnerabilities MaxMood96/redash#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too easy to accidentally schedule queries for long periods that cost a lot of money #949

Too easy to accidentally schedule queries for long periods that cost a lot of money #949

wlach commented May 6, 2019

jezdez commented May 9, 2019 •

edited

Loading

wlach commented May 9, 2019

emtwo commented Aug 8, 2019

jezdez commented Aug 9, 2019

jasonthomas commented Aug 9, 2019

wlach commented Aug 9, 2019

emtwo commented Aug 9, 2019

Too easy to accidentally schedule queries for long periods that cost a lot of money #949

Too easy to accidentally schedule queries for long periods that cost a lot of money #949

Comments

wlach commented May 6, 2019

jezdez commented May 9, 2019 • edited Loading

wlach commented May 9, 2019

emtwo commented Aug 8, 2019

jezdez commented Aug 9, 2019

jasonthomas commented Aug 9, 2019

wlach commented Aug 9, 2019

emtwo commented Aug 9, 2019

jezdez commented May 9, 2019 •

edited

Loading