-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inserted
might be None
#1525
Comments
One immediate thought that springs to mind is that the query uses |
Sorry, the "UNION ALL" does the opposite. "This PostgreSQL UNION ALL operator would return a category_id multiple times in your result set if the category_id appeared in both the products and categories table. The PostgreSQL UNION ALL operator does not remove duplicates. If you wish to remove duplicates, try using the PostgreSQL UNION operator." "[UNION] eliminates duplicate rows from its result, in the same way as DISTINCT, unless UNION ALL is used." |
Happened again in buildhub: https://gist.github.com/peterbe/fca3ab72f91869bd3f6218a69dec878b |
More examples for those who can see this: https://sentry.prod.mozaws.net/operations/kinto-testpilot-prod/issues/3321530/ |
I cannot reproduce, but I don't think |
When you say "I cannot reproduce" did you mean you wrote a unit test (functional with real Postgres) or did you mean you tried via HTTP? I studied the SQL and tested it locally and it looks correct. If the record doesn't exist, it works. If the record existed but was deleted, it works (and makes it not deleted). If it already existed and wasn't delete it still works.
How could that happen?? The only time the first part of the I'm starting to suspect that it lies somewhere else outside this SQL query. I don't actually know how transactions in kinto are done but one thing that makes me nervous is that the SQL query is a write operation (sometimes) but there's no commit on the transaction. |
I've tried really hard to reproduce this and failing. I think the current solution isn't thread-safe but it's hard to prove because of the way transactions are done in kinto. It's not immediately clear where the commit happens after the The SQL query can do three things:
If the first thing happens, it will raise a But it's not clear where the One interesting thing to remember is that we've seen a lot of these errors in buildhub. In buildhub, we never delete things. I.e. we never set
It concerns me what this can happen. Perhaps we can work together to audit how the transactions are committed. I really think there are better patterns to insert something |
Me too! I spent quite a while on this today and could not come up with a clear path that would lead to this kind of situation either. According to Sentry it happened like 100 times on Testpilot and almost 500 times on buildhub. This is far from being paranormal!
The transactions are bound to the request lifecycle using pyramid_tm and this SQLAlchemy binding: kinto/kinto/core/storage/postgresql/client.py Lines 105 to 107 in 4c03306
Commit or rollbacks are done just before the response is served. In buildhub we use kinto-elasticsearch which introduces a round-trip to an ES server in the response cycle. It could make the transactions last a lot longer than normal.
:) |
Some progress! See (but don't judge me how ugly code is): https://gist.github.com/peterbe/883da6f65e315383ca83b7ade5e94e42 When you run that you get this:
...every time! What it simulates is two concurrent threads. If you merge their timelines you get this:
Basically, if you extrapolate that to the Python/kinto layer I think this happens:
The combination (parent_id, collection_id, last_modified) is different in both calls so you don't get trapped by the Please check if I've understood that right. If that's the case, the right way to write a unit test would be something like these pseudo |
Erwin suggested you use a loop instead. I don't really yet know what that means for us. Either way, we could perhaps instead reject on a Python level if The reason that's probably a good idea is that one of the clients did get an OK so it would not make sense to update the record for that other client. Meaning, if it's true that two concurrent clients attempt the same |
Wao thanks! Excellent :)
Definitely, if we know that when inserted is None we are in a conflict situation, we can raise a python exception. The client should probably get a |
The easy fix would be to pursue my unfinished PR and just deal with the fact that the The upsert prototype tries to break up the task by attempting to do an update (e.g. undoing the logical delete) with an UPDATE (using row-level locking), then it tries to do an insert and watch out for IntegrityErrors. That makes sure we can control two concurrent attempts to call Looking for thoughts and feedback on this. |
I wonder if an easy fix in that case wouldn't be to raise a 409 Conflict error. |
Just to recap some of the above:
This might be a good time to talk about switching to a stronger serialization level, for example repeatable read. This would at least get us to a place where transactions behave predictably. On the other hand, we'd then have to be ready to handle serialization errors, which might require a lot more work. The Postgres documentation does say (about read committed): "Because of the above rules, it is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database." Indeed, I think that's what we're seeing. This behavior seems to remain in v10 and v11. |
See
kinto/kinto/core/storage/postgresql/__init__.py
Lines 306 to 308 in a6b0838
If
inserted
becomesNone
you get aTypeError
trying to callinserted['inserted']
.This has actually happened in a real production instance:
For those with access see this Sentry entry.
The text was updated successfully, but these errors were encountered: