purge_old_records: add --max-offset option and some additional logging statements #169

jrgm · 2020-01-10T23:20:32Z

I was working with the purge task in production to get it to run more effectively, and noticed a couple of things.

the default max-per-loop of 10 meant that each task was spending a large chunk of each cycle doing the select, and spinning a lot of database cpu. So I'll be using a max-per-loop of ~1000 going forward.
Due to the current select for old_records, each purge task winds up working on the same set of old_records. So this PR adds a --max-offset option to the task. The idea is that if each instance's task picks a random offset < max_offset, then it will be unlikely to have any 2 instances working on the same set of records. I'm thinking of running something like max-per-loop=1000 and max-offset=100000. The offset will add some additional cost to each select, so I'll have to play with those numbers to get the right tradeoff. But one thing I'm unclear about: the current query does order by replaced_at desc, uid desc. Is there some importance to that ordering? I.e., if I'm skipping to the middle of that list with an offset does that break some expectation?

This PR also adds a couple of logging statements I was interested in being able to see.

(Note: one gotcha with --max-offset is that if the number of purgeable rows is << max_offset, then no rows will be purged. I didn't try to defend against that in this PR. When the backlog is down to a manageable level I'll change max_offset to be zero (or maybe 10000 to keep a little bit of scatter). Or maybe not change it all: enough instances will get work to do by chance to keep to backlog under control).

rfk · 2020-01-10T23:25:50Z

the current query does order by replaced_at desc, uid desc.
Is there some importance that that ordering

I don't think it will affect the correctness of the script; most likely this is just trying to return the results in a consistent order to simplify testing.

rfk

Makes sense, thanks @jrgm!

one gotcha with --max-offset is that if the number of purgeable rows is << max_offset,
then no rows will be purged. I didn't try to defend against that in this PR.

Ack. This seems manageable, but may be worth writing down inline in the code somewhere for posterity.

tokenserver/scripts/purge_old_records.py

jrgm · 2020-01-10T23:43:06Z

I added a note in the docstring, and an inline comment about <<< max_offset.

… and when row.node is None

jrgm requested a review from rfk January 10, 2020 23:20

jrgm changed the title ~~purge old records additional logging and select offset~~ purge_old_records: add --max-offset option and some additional logging statements Jan 10, 2020

rfk approved these changes Jan 10, 2020

View reviewed changes

tokenserver/scripts/purge_old_records.py Show resolved Hide resolved

jrgm force-pushed the dockerpush.purge-old-records-additional-logging-and-select-offset branch from 9825ecf to 9e228ab Compare January 10, 2020 23:40

jrgm force-pushed the dockerpush.purge-old-records-additional-logging-and-select-offset branch 2 times, most recently from 9df5e00 to cc4e929 Compare January 11, 2020 02:26

purge_old_records: select with random offset range, log rows fetched,…

476be96

… and when row.node is None

jrgm force-pushed the dockerpush.purge-old-records-additional-logging-and-select-offset branch from cc4e929 to 476be96 Compare January 11, 2020 02:34

jrgm merged commit 8890f95 into master Jan 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

purge_old_records: add --max-offset option and some additional logging statements #169

purge_old_records: add --max-offset option and some additional logging statements #169

jrgm commented Jan 10, 2020 •

edited

Loading

rfk commented Jan 10, 2020

rfk left a comment

jrgm commented Jan 10, 2020

purge_old_records: add --max-offset option and some additional logging statements #169

purge_old_records: add --max-offset option and some additional logging statements #169

Conversation

jrgm commented Jan 10, 2020 • edited Loading

rfk commented Jan 10, 2020

rfk left a comment

Choose a reason for hiding this comment

jrgm commented Jan 10, 2020

jrgm commented Jan 10, 2020 •

edited

Loading