`inserted` might be None #1525

peterbe · 2018-03-01T12:45:35Z

See

kinto/kinto/core/storage/postgresql/__init__.py

Lines 306 to 308 in a6b0838

    
               inserted = result.fetchone() 
        
           if not inserted['inserted']:

If inserted becomes None you get a TypeError trying to call inserted['inserted'].

This has actually happened in a real production instance:

Exception(most recent call first)

TypeError: 'NoneType' object is not subscriptable
  File "kinto/core/utils.py", line 414, in follow_subrequest
    return request.invoke_subrequest(subrequest, **kwargs), subrequest
  File "pyramid/router.py", line 195, in invoke_subrequest
    return self.invoke_request(request, _use_tweens=use_tweens)
  File "pyramid/router.py", line 249, in invoke_request
    response = handle_request(request)
  File "pyramid/router.py", line 156, in handle_request
    view_name
  File "pyramid/view.py", line 642, in _call_view
    response = view_callable(context, request)
  File "pyramid/config/views.py", line 181, in __call__
    return view(context, request)
  File "pyramid/viewderivers.py", line 390, in attr_view
    return view(context, request)
  File "pyramid/viewderivers.py", line 368, in predicate_wrapper
    return view(context, request)
  File "pyramid/viewderivers.py", line 301, in secured_view
    return view(context, request)
  File "pyramid/viewderivers.py", line 439, in rendered_view
    result = view(context, request)
  File "pyramid/viewderivers.py", line 148, in _requestonly_view
    response = view(request)
  File "cornice/service.py", line 494, in wrapper
    response = view_()
  File "kinto/core/resource/__init__.py", line 468, in put
    record = self.model.create_record(new_record)
  File "kinto/core/resource/model.py", line 315, in create_record
    record = super().create_record(record, parent_id)
  File "kinto/core/resource/model.py", line 191, in create_record
    auth=self.auth)
  File "statsd/client.py", line 37, in _wrapped
    return_value = f(*args, **kwargs)
  File "kinto/core/storage/postgresql/__init__.py", line 308, in create
    if not inserted['inserted']:

For those with access see this Sentry entry.

The text was updated successfully, but these errors were encountered:

peterbe · 2018-03-01T12:51:11Z

One immediate thought that springs to mind is that the query uses UNION ALL for two queries. (The ALL just helps to avoid duplicates across the queries). But the python code uses fetchone(). There's something illogical there. "Give be A and B. Result is the first one."

peterbe · 2018-03-01T12:53:36Z

Sorry, the "UNION ALL" does the opposite.

"This PostgreSQL UNION ALL operator would return a category_id multiple times in your result set if the category_id appeared in both the products and categories table. The PostgreSQL UNION ALL operator does not remove duplicates. If you wish to remove duplicates, try using the PostgreSQL UNION operator."
-- https://www.techonthenet.com/postgresql/union_all.php

"[UNION] eliminates duplicate rows from its result, in the same way as DISTINCT, unless UNION ALL is used."
-- https://www.postgresql.org/docs/8.3/static/queries-union.html

peterbe · 2018-03-13T16:13:16Z

Happened again in buildhub: https://gist.github.com/peterbe/fca3ab72f91869bd3f6218a69dec878b

peterbe · 2018-03-19T20:22:12Z

More examples for those who can see this: https://sentry.prod.mozaws.net/operations/kinto-testpilot-prod/issues/3321530/

leplatrem · 2018-04-16T10:25:05Z

inserted can be None when:

none of the 2 requests in the union returns a record
the cursor is consumed twice

I cannot reproduce, but I don't think UNION/UNION ALL is the problem here

peterbe · 2018-04-16T15:46:20Z

When you say "I cannot reproduce" did you mean you wrote a unit test (functional with real Postgres) or did you mean you tried via HTTP?

I studied the SQL and tested it locally and it looks correct. If the record doesn't exist, it works. If the record existed but was deleted, it works (and makes it not deleted). If it already existed and wasn't delete it still works.

none of the 2 requests in the union returns a record

How could that happen?? The only time the first part of the UNION would return 0 records is if it already exists (by id, parent_id and collection_id) and the update did not do anything. If that is the case, the second part would find it and return at least 1 record. But that makes no sense.

I'm starting to suspect that it lies somewhere else outside this SQL query. I don't actually know how transactions in kinto are done but one thing that makes me nervous is that the SQL query is a write operation (sometimes) but there's no commit on the transaction.

peterbe · 2018-04-16T19:26:40Z

I've tried really hard to reproduce this and failing.

I think the current solution isn't thread-safe but it's hard to prove because of the way transactions are done in kinto. It's not immediately clear where the commit happens after the kinto_client.create() record happens.

The SQL query can do three things:

Do nothing, just return the row unchanged (inserted=FALSE)
Insert the record (inserted=TRUE)
Update the existing record and change deleted=TRUE (inserted=TRUE)

If the first thing happens, it will raise a UnicityError. (I guess that's to inform the client that you tried to create a record that already existed!)

But it's not clear where the COMMIT happens. I suspect what might happens is that record is changing during an insert. But it's not clear either.

One interesting thing to remember is that we've seen a lot of these errors in buildhub. In buildhub, we never delete things. I.e. we never set deleted=TRUE on any records. So what happens there is probably concurrent calls to kinto_client.create(same, arguments, in, concurrent, threads). But the strange thing is that if you try to call this SQL with the same it should raise an exception like:

duplicate key value violates unique constraint "idx_records_parent_id_collection_id_last_modified"
DETAIL:  Key (parent_id, collection_id, last_modified)=(/buckets/build-hub/collections/releases, record, 2018-04-16 09:39:00.004) already exists.

It concerns me what this can happen. Perhaps we can work together to audit how the transactions are committed. I really think there are better patterns to insert something

leplatrem · 2018-04-16T20:03:24Z

It concerns me what this can happen.

Me too! I spent quite a while on this today and could not come up with a clear path that would lead to this kind of situation either. According to Sentry it happened like 100 times on Testpilot and almost 500 times on buildhub. This is far from being paranormal!

Perhaps we can work together to audit how the transactions are committed

The transactions are bound to the request lifecycle using pyramid_tm and this SQLAlchemy binding:

kinto/kinto/core/storage/postgresql/client.py

Lines 105 to 107 in 4c03306

    
               # Plug with Pyramid transaction manager 
        
               options['extension'] = ZopeTransactionExtension() 
        
           session_factory = scoped_session(sessionmaker(bind=engine, **options))

Commit or rollbacks are done just before the response is served.

In buildhub we use kinto-elasticsearch which introduces a round-trip to an ES server in the response cycle. It could make the transactions last a lot longer than normal.

I really think there are better patterns to insert something

:)

peterbe · 2018-04-16T20:33:40Z

Some progress!
I managed to demonstrate the problem that Erwin talks about in: https://stackoverflow.com/a/15950324/205832

See (but don't judge me how ugly code is): https://gist.github.com/peterbe/883da6f65e315383ca83b7ade5e94e42

When you run that you get this:

▶ python dummy.py
! 0
! 1
commit 1
1 page data = ('firefox_beta_58-0b15rc1_linux-i686_cs2', {}, 1523861592005, True)
commit 0
0 page data = None

...every time!

What it simulates is two concurrent threads. If you merge their timelines you get this:

(Thread 1) BEGIN
(Thread 2) BEGIN
(Thread 1) WITH create_record...
(Thread 2) WITH create_record...
(Thread 2) COMMIT

Basically, if you extrapolate that to the Python/kinto layer I think this happens:

# Thread 1
kinto_client.create('myid', 'mycollection', 'mybucket', '2018-04-16 06:53:12.004')

# Thread 2
kinto_client.create('myid', 'mycollection', 'mybucket', '2018-04-16 06:53:12.005')

The combination (parent_id, collection_id, last_modified) is different in both calls so you don't get trapped by the idx_records_parent_id_collection_id_last_modified uniqueness index on the inserts.

Please check if I've understood that right. If that's the case, the right way to write a unit test would be something like these pseudo kinto_client.create(...) above.

peterbe · 2018-04-16T20:41:41Z

I really think there are better patterns to insert something

Erwin suggested you use a loop instead. I don't really yet know what that means for us.

Either way, we could perhaps instead reject on a Python level if inserted = result.fetchone() becomes None. If that happens, we know you called create() with the same ID (and parent_id and collection_id) but with different last_modified). It could push the problem onto the calling client.

The reason that's probably a good idea is that one of the clients did get an OK so it would not make sense to update the record for that other client.

Meaning, if it's true that two concurrent clients attempt the same .create() (with possibly different JSON!) it would be weird to "correct" the second client's call.

leplatrem · 2018-04-17T13:33:35Z

Wao thanks! Excellent :)

Meaning, if it's true that two concurrent clients attempt the same .create() (with possibly different JSON!) it would be weird to "correct" the second client's call.

Definitely, if we know that when inserted is None we are in a conflict situation, we can raise a python exception. The client should probably get a 409 Conflict (that will be retried)

peterbe · 2018-04-24T20:22:27Z

The easy fix would be to pursue my unfinished PR and just deal with the fact that the inserted becomes None without any locks.

The upsert prototype tries to break up the task by attempting to do an update (e.g. undoing the logical delete) with an UPDATE (using row-level locking), then it tries to do an insert and watch out for IntegrityErrors. That makes sure we can control two concurrent attempts to call create() with the same parameters.

Looking for thoughts and feedback on this.

Natim · 2018-11-30T14:33:05Z

I have this bug often with the Default bucket apparently.

Natim · 2018-12-03T09:27:55Z

I wonder if an easy fix in that case wouldn't be to raise a 409 Conflict error.

glasserc · 2018-12-05T21:03:23Z

Just to recap some of the above:

inserted being none seems impossible because the query has two parts. The first part tries to insert the record and replaces any tombstone; the second part tries to fetch an existing record. inserted being none means it couldn't insert (there's an existing record) but also that it couldn't fetch an existing record, which seems inconsistent.
Like Peter says, it's probably coming from two different threads both doing create() at once. The first thread is inserting (or maybe updating?) successfully but hasn't committed; the second one is not able to insert but not able to read the other (uncommitted) row.
create() has to raise a UnicityError in case the record already exists. This is important for e.g. default_bucket not running object creation events. This is not exactly the same as raising a 409. Note that the two requests don't actually conflict so a 409 is a little bit wrong.
UnicityError takes as an argument the already-existing record. Of course, if we get no inserted result, then we don't have any way to get access to the already-existing record.
The code links to a StackOverflow answer about how to do a "SELECT or INSERT" query, although the answer has been updated since I landed this code. Although the existing code is based on the old answer, the current wisdom is that you can hit the same race condition we do, and that you have to either loop, use a different serialization level, or accept the possibility of the race condition.

This might be a good time to talk about switching to a stronger serialization level, for example repeatable read. This would at least get us to a place where transactions behave predictably. On the other hand, we'd then have to be ready to handle serialization errors, which might require a lot more work.

The Postgres documentation does say (about read committed): "Because of the above rules, it is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database." Indeed, I think that's what we're seeing. This behavior seems to remain in v10 and v11.

leplatrem added the bug label Mar 1, 2018

peterbe added a commit to peterbe/kinto that referenced this issue Apr 17, 2018

inserted might be None, fixes Kinto#1525

abd9ca0

peterbe mentioned this issue Apr 17, 2018

[WIP] inserted might be None, fixes #1525 #1598

Closed

6 tasks

peterbe mentioned this issue Apr 24, 2018

TOCTOU issues at resource level #1407

Open

peterbe mentioned this issue Apr 30, 2018

Architecture for buildhub.json mozilla-services/buildhub#437

Closed

glasserc mentioned this issue Dec 5, 2018

Remove record field from UnicityError exception #1919

Merged

3 tasks

leplatrem closed this as completed in #1919 Dec 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`inserted` might be None #1525

`inserted` might be None #1525

peterbe commented Mar 1, 2018

peterbe commented Mar 1, 2018

peterbe commented Mar 1, 2018

peterbe commented Mar 13, 2018

peterbe commented Mar 19, 2018

leplatrem commented Apr 16, 2018

peterbe commented Apr 16, 2018

peterbe commented Apr 16, 2018

leplatrem commented Apr 16, 2018

peterbe commented Apr 16, 2018

peterbe commented Apr 16, 2018

leplatrem commented Apr 17, 2018

peterbe commented Apr 24, 2018

Natim commented Nov 30, 2018 •

edited

Loading

Natim commented Dec 3, 2018

glasserc commented Dec 5, 2018

inserted might be None #1525

inserted might be None #1525

Comments

peterbe commented Mar 1, 2018

peterbe commented Mar 1, 2018

peterbe commented Mar 1, 2018

peterbe commented Mar 13, 2018

peterbe commented Mar 19, 2018

leplatrem commented Apr 16, 2018

peterbe commented Apr 16, 2018

peterbe commented Apr 16, 2018

leplatrem commented Apr 16, 2018

peterbe commented Apr 16, 2018

peterbe commented Apr 16, 2018

leplatrem commented Apr 17, 2018

peterbe commented Apr 24, 2018

Natim commented Nov 30, 2018 • edited Loading

Natim commented Dec 3, 2018

glasserc commented Dec 5, 2018

`inserted` might be None #1525

`inserted` might be None #1525

Natim commented Nov 30, 2018 •

edited

Loading