Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

purge_old_records: add --max-offset option and some additional logging statements #169

Merged

Conversation

jrgm
Copy link
Contributor

@jrgm jrgm commented Jan 10, 2020

r? - @rfk

I was working with the purge task in production to get it to run more effectively, and noticed a couple of things.

  1. the default max-per-loop of 10 meant that each task was spending a large chunk of each cycle doing the select, and spinning a lot of database cpu. So I'll be using a max-per-loop of ~1000 going forward.

  2. Due to the current select for old_records, each purge task winds up working on the same set of old_records. So this PR adds a --max-offset option to the task. The idea is that if each instance's task picks a random offset < max_offset, then it will be unlikely to have any 2 instances working on the same set of records. I'm thinking of running something like max-per-loop=1000 and max-offset=100000. The offset will add some additional cost to each select, so I'll have to play with those numbers to get the right tradeoff. But one thing I'm unclear about: the current query does order by replaced_at desc, uid desc. Is there some importance to that ordering? I.e., if I'm skipping to the middle of that list with an offset does that break some expectation?

This PR also adds a couple of logging statements I was interested in being able to see.

(Note: one gotcha with --max-offset is that if the number of purgeable rows is << max_offset, then no rows will be purged. I didn't try to defend against that in this PR. When the backlog is down to a manageable level I'll change max_offset to be zero (or maybe 10000 to keep a little bit of scatter). Or maybe not change it all: enough instances will get work to do by chance to keep to backlog under control).

@jrgm jrgm requested a review from rfk January 10, 2020 23:20
@jrgm jrgm changed the title purge old records additional logging and select offset purge_old_records: add --max-offset option and some additional logging statements Jan 10, 2020
@rfk
Copy link
Contributor

rfk commented Jan 10, 2020

the current query does order by replaced_at desc, uid desc.
Is there some importance that that ordering

I don't think it will affect the correctness of the script; most likely this is just trying to return the results in a consistent order to simplify testing.

Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks @jrgm!

one gotcha with --max-offset is that if the number of purgeable rows is << max_offset,
then no rows will be purged. I didn't try to defend against that in this PR.

Ack. This seems manageable, but may be worth writing down inline in the code somewhere for posterity.

tokenserver/scripts/purge_old_records.py Show resolved Hide resolved
@jrgm jrgm force-pushed the dockerpush.purge-old-records-additional-logging-and-select-offset branch from 9825ecf to 9e228ab Compare January 10, 2020 23:40
@jrgm
Copy link
Contributor Author

jrgm commented Jan 10, 2020

I added a note in the docstring, and an inline comment about <<< max_offset.

@jrgm jrgm force-pushed the dockerpush.purge-old-records-additional-logging-and-select-offset branch 2 times, most recently from 9df5e00 to cc4e929 Compare January 11, 2020 02:26
@jrgm jrgm force-pushed the dockerpush.purge-old-records-additional-logging-and-select-offset branch from cc4e929 to 476be96 Compare January 11, 2020 02:34
@jrgm jrgm merged commit 8890f95 into master Jan 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants