Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

Add the ability to self check the database. #1768

Merged
merged 10 commits into from
Jan 6, 2014
Merged

Add the ability to self check the database. #1768

merged 10 commits into from
Jan 6, 2014

Conversation

zbynekwinkler
Copy link
Contributor

Subclasses Postgres to prepare for write-wrapping for #1549
and to add self checks for #1118 and #1705.

Fixes #1118, #1705.

Subclasses Postgres to prepare for write-wrapping for #1549
and to add self checks for #1180 and #1705.

Fixes #1180, #1705.
@zbynekwinkler
Copy link
Contributor Author

Currently the self check fails on the production database because there are unexplained orphans.

What shell we do with them?

 4c46cc22afdd
 82b0d81fe9e8
 6b6527ac6c02
 cbde8c31c11c
 bcfc65158eaf
 31d54a0c19ae
 afbddadaac3c
 a78c4e42bb93
 42eb93b3ab89
 b1bc5e47fe8e
 3ac515cc8da6
 14a79340c40d
 14d60c6884e7
 0c783dee50ed
 e2020536ef6d
 60a5099d49c7
 64f4f959b322

Edit: Ordered from the newest (2013-12-05) to the oldest (2012-09-20). The oldest (64f4f959b322) is the only one claimed.

@zbynekwinkler
Copy link
Contributor Author

How could have been the orphans above created?

  • row with elsewhere.participant was deleted
  • row with elsewhere.participant was updated to point to a different participant
  • participant was renamed
  • participant was created but elsewhere was not

To check the above mentioned cases:

  • nothing is ever deleted from elsewhere
  • the only place where elsewhere.participant is updated is in take_over
  • the field elsewhere.participant is constrained to be a foreign key to participants.username with 'on update cascade' so it follows renames
  • participant is created either in upsert (immediately sets elsewhere) or in reserve_a_random_username (mostly used again in take_over)

So, where does that leave us? Another bug in take_over?

@zbynekwinkler
Copy link
Contributor Author

git blame says that absorptions table was created at 2012-12-12. So the oldest two accounts 'have the right' not to have entry in there (both from September last year). The rest is all 2013 (the newest being just 11 days ago). That means that the code doing this is still with us and so we'd better find it! 😄

@zbynekwinkler
Copy link
Contributor Author

I'll try to do some forensics using the backups. Since we have the automatic capture on I'll try to compare the snapshot from 2013-11-30 and the one from 2013-12-05 or 2013-12-06 (which ever has the newest orphan).

@zbynekwinkler
Copy link
Contributor Author

I ended up comparing 2013-11-29 and 2013-11-30. One of the changes looks suspicious and would explain why I haven't been able to find anything in the code. The timing of #54 (comment) nicely corresponds with the interval when the orphan appeared.

@whit537 Can these orphans be some remains after hand-changes to the db to fulfill the account deactivation requests? Have you ever created a participant and/or deleted from elsewhere and/or reset elsewhere.participant to someone else?

Also run self_check from the homepage updater thread. This way we should
learn as soon as possible about new orphans by Sentry report.
@zbynekwinkler
Copy link
Contributor Author

Now the self_check passes on the production db. Turning it on in the homepage updater thread is an attempt to better identify the moment when the next orphan appears. Maybe that will give us a hit as to what is wrong.

I now consider this pull request ready to review and/or merge.

@zbynekwinkler
Copy link
Contributor Author

Sorry if this seems tiring but I have another problem 😉. I tried to add check that accounts that have been absorbed have no non zero outstanding tips.

But account 4c074000c7bc has an outstanding non zero tip. What has happened is that the absorption occurred at second 26 while the new tip is from second 38. It has been created after the absorption. And from the point of view of take_over it is a self tip. The times are:

claimed_time:    09:48:14.984393+00
tip ctime:       09:48:15.788727+00
absorption:      09:49:26.713256+00
session_expires: 09:49:26.713256+00
tip mtime:       09:49:38.67472+00

So... how could someone create a tip 12s after being singed out? The user has also managed to create 6 tips within 1s so most likely a script has been used.

@zbynekwinkler
Copy link
Contributor Author

I am not able to find what code was online in May this year (take_over). I wanted to check if the code to reset the session has been part of take_over at that time. Anyway, I propose to remove the offending tip.

I might come up with more self checks 😉

Edit: I've found it, the session reset was there. So I have no idea how that happened.

@chadwhitacre
Copy link
Contributor

This is awesome. I need to read it.

@bruceadams
Copy link
Contributor

This is great stuff. I'd love to have have a conversation with @zwn to walk through this code.

chadwhitacre and others added 4 commits January 6, 2014 12:35
Conflicts:
	configure-aspen.py
	gittip/testing/__init__.py
	tests/test_billing.py
	tests/test_charts_json.py
Conflicts:
	configure-aspen.py
	gittip/testing/__init__.py
	tests/test_billing.py
	tests/test_charts_json.py
chadwhitacre added a commit that referenced this pull request Jan 6, 2014
Add the ability to self check the database.
@chadwhitacre chadwhitacre merged commit 1f39787 into master Jan 6, 2014
@chadwhitacre chadwhitacre deleted the self-check branch January 6, 2014 20:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sanity-check balances
3 participants