Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

someone managed to orphan their participant! #617

Closed
chadwhitacre opened this issue Feb 7, 2013 · 29 comments · Fixed by #1883
Closed

someone managed to orphan their participant! #617

chadwhitacre opened this issue Feb 7, 2013 · 29 comments · Fixed by #1883

Comments

@chadwhitacre
Copy link
Contributor

Check this out:

https://www.gittip.com/c9b2c1e22b5d/

I discovered this by expanding the leaderboard, like so:

https://www.gittip.com/?limit=20

Investigate!

@chadwhitacre
Copy link
Contributor Author

Another one:

https://www.gittip.com/d6b62b6b95c9/

@abnor
Copy link

abnor commented Mar 8, 2013

https://www.gittip.com/?limit=1000

woooooahhh. Is there potential to make this feature a drop down menu option as a sort of primitive search engine?

@chadwhitacre
Copy link
Contributor Author

@abnor Oooh! That's neat with the new design, good find. :)

BTW, what's your Twitter?

@abnor
Copy link

abnor commented Mar 8, 2013

I do not have a Twitter... Should I?

@chadwhitacre
Copy link
Contributor Author

@abnor Yes. :)

@abnor
Copy link

abnor commented Mar 8, 2013

Okayyyy

https://twitter.com/LyndyPalmer

@chadwhitacre
Copy link
Contributor Author

@chadwhitacre
Copy link
Contributor Author

I am making the world better by actually being https://www.gittip.com/whit537/.

https://www.gittip.com/6a2ab9485317/

wtf?

@chadwhitacre
Copy link
Contributor Author

@chadwhitacre
Copy link
Contributor Author

@chadwhitacre
Copy link
Contributor Author

@chadwhitacre
Copy link
Contributor Author

+1 from @mvdkleijn on Twitter.

@mvdkleijn
Copy link
Contributor

Solution seems simple to me... delete (on a regular basis) all profiles that don't have any accounts connected and don't have any bank or credit card account connected.

Also: are we sure these are accidentally orphaned accounts or could these be some form of spam?

Note: https://www.gittip.com/1a36b29a7161/ has a credit card connected...

@chadwhitacre
Copy link
Contributor Author

@mvdkleijn The db repair is straightforward, yes. The hard part is understanding how these happen in the first place and closing any loopholes.

@mvdkleijn
Copy link
Contributor

@whit537 I’ll see if I can run / think of some test scenarios to replicate the problem. If any orphaned accounts have funds escrowed with us we'll need to figure out what to do with that.

@mvdkleijn
Copy link
Contributor

Possibly related to how Twitter handles accounts (and deactivation)? See screenshot in #54 and the remark by @sprice at #54 (comment)

@chadwhitacre
Copy link
Contributor Author

@zbynekwinkler
Copy link
Contributor

I am looking into this issue (ref. #1549, #1598, #650). Right now we have 559 orphaned accounts.

@zbynekwinkler
Copy link
Contributor

The code in question is in https://github.com/gittip/www.gittip.com/blob/master/gittip/models/_mixin_elsewhere.py#L119. The orphaned participant is just renamed to a random username and kept in the participants table. Since I do not know the history of the code I don't know if it has ever worked differently.

If this is the correct way to count orphaned participants

select count(*) from participants where not exists (select * from elsewhere where elsewhere.participant=username);

then we actually have 6182 of them (and not 559) out of almost 50k. The absorptions table has 6169 rows so we have 13 unacounted orphans that I am not sure where they come from or what to do with them.
https://dataclips.heroku.com/buljjtubfplplynvcchxsdsqewyf
One of them has been created just a couple hours ago. Interesting about them is that ctime and session_expires are identical.

Steps to create orphaned participant:

  1. start with empty db (make schema)
  2. visit /on/[platform]/[name] of at least 2 of your identities - creates 2 participants with random usernames
  3. sign in with one - renames one of the participants to a [name]
  4. connect the other identity to the currently signed in participant - creates entry in absorptions but leaves renamed orphaned participant behind

@whit537 Why do we keep orphans around? Is there some danger in deleting the orphans that have corresponding entry in absorptions? Do you rework this part of the code in the elsewhere branch or can/should this be a separate fix?

@ghost ghost assigned zbynekwinkler Oct 19, 2013
@zbynekwinkler
Copy link
Contributor

The query to show the unexplained orphans:

select username, ctime, claimed_time, session_expires
   from participants
  where not exists (select * from elsewhere where elsewhere.participant=username)
    and not exists (select * from absorptions where archived_as=username)
order by ctime desc;

@chadwhitacre
Copy link
Contributor Author

Dropping from Infrastructure per IRC.

@zbynekwinkler
Copy link
Contributor

I decided to revisit this issue now when we fixed #1704. The accounts that were involved in manifestation of that bug are:

c9b2c1e22b5d - first reported
a5ad0ebe3dff
182eed83c351
4c074000c7bc
187cb554c30a
bb670f195cd3
43cafe74bc45
81d042613b23

I have run github searches for all of them but except the first one, none were involved in any bugs.

I am closing this issue on the grounds that keeping orphans in the participants table is "by design" so there is nothing surprising that we have them. On the other hand orphans should not be involved in any part of the site and if they are let's track that as a separate issue. Should anyone feel differently, please reopen.

@zbynekwinkler
Copy link
Contributor

Oh well, that was too quick. There are still unexplained orphans from #617 (comment).

Continuing in #1768.

@zbynekwinkler
Copy link
Contributor

I have a new theory on where the unexplained orphans might be coming from. Recently we got a new orphan #1856. It has the same ctime as session_expire - both default to now in Postgres. This likely means that the account has been created but never used for anything. The only 2 places where we create accounts are

gittip/elsewhere/init.py#L97
gittip/models/_mixin_elsewhere.py#L327

Both places immediately attach elsewhere account. Both are run in a transaction that should either commit both or rollback both. The catch might be the transaction isolation level http://www.postgresql.org/docs/9.1/static/transaction-iso.html. Default level is read committed. The higher levels are repeatable read and serializable.

When the site is under a stress (read slow) user might hit refresh in the take over page trying to get it working. It does not cancel the currently running request handling and starts another one with the same data. Both transaction can get to a point gittip/models/_mixin_elsewhere.py#L327 where they would be wrestling with trying to attach a single account elsewhere to two newly created participants, obviously only one of them winning.

I'll try to prepare a simplified example to illustrate the situation and actually confirm if something like that is possible and if it gets fixed by using higher levels of transaction isolation.

@zbynekwinkler
Copy link
Contributor

Why do we actually do gittip/models/_mixin_elsewhere.py#L327? Why don't we just dump the account elsewhere? We are not using it for anything (not attaching any information to it). If the proper '/on/...' url gets visited the elsewhere account gets recreated if missing anyway. @whit537 anything I am missing here? I love deleting code 😛

@zbynekwinkler
Copy link
Contributor

To recreate the problem by hand, follow the these steps:

  1. make schema
  2. create participant 'a'
  3. attach elsewhere platform='g', user_id='a'
  4. run fist psql session - start transaction, create participant 'a1'
  5. run second psql session - start transaction, create participant 'a2'
  6. in first session run - update elsewhere set participant='a1' where platform='g' and participant='a' - this will update 1 row - commit
  7. in second session try to do the same for 'a2' - update will not match any rows (where participant='a') because first transaction is already commited, thus creating an orphan

Which is what is happening. If we were to upgrade the isolation level at least to 'repeatable read' we would not have this problem because the whole transaction would get snapshot of the db at the beginning of the transaction (update would find the row to update, which would conflict and raise error). To be able to upgrade the isolation level of the transactions we need to be able to deal with serialization errors. Also further (deeper) thought needs to go into API review - for example current take_over function if called after serialization failure would unexpectedly undo itself.

TL DR;

  • We need to increase transaction isolation (possible in the whole app).
  • To be able to do that, all places need to be able to deal with serialization errors.
  • To be able to do that, the parts of code we rerun need to be idempotent.

@zbynekwinkler
Copy link
Contributor

Looking at papertrail what was happening at the time the orphan was created I have only one relevant line which seems to point to the take_over direction.

Jan 09 12:16:55 gittip heroku/router:  at=info method=POST path=/on/take-over.html host=www.gittip.com fwd="x.x.x.x" dyno=web.1 connect=1ms service=40ms status=302 bytes=0 

@chadwhitacre
Copy link
Contributor Author

!m @zwn

Why don't we just dump the account elsewhere? We are not using it for anything (not attaching any information to it).

The one thing I can think of is that we keep the association with the old participant, so if needed we would be able to reconstruct that the old participant has this elsewhere attached to it. In the long run #1549 is the way to go.

@chadwhitacre
Copy link
Contributor Author

I ran the branch.sql to delete the users. I didn't append it to schema.sql because it changed data and not schema. Pruned in bd32cea.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants