You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instance: 449970630051
Traceback (most recent call last):
File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 550, in <module>
okay = ns['func'](ns)
File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 438, in start_forever_test
return test_forever(ns)
File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 378, in test_forever
(client1, client2, collection1, collection2) = get_clients_and_collections(ns)
File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 43, in get_clients_and_collections
transactional_shim.remove(collection1)
File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/transactional_shim.py", line 8, in func_wrapper
ret = func(*args, **kwargs)
File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/transactional_shim.py", line 47, in func_wrapper
return func(*args, **kwargs)
File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/transactional_shim.py", line 56, in _gen_func
return getattr(collection, name)(*args, **kwargs)
File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 2996, in remove
spec_or_id, multi, write_concern, collation=collation)
File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 1123, in _delete_retryable
_delete, session)
File "/app/.python2/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1102, in _retryable_write
return self._retry_with_session(retryable, func, s, None)
File "/app/.python2/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1079, in _retry_with_session
return func(session, sock_info, retryable)
File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 1119, in _delete
retryable_write=retryable_write)
File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 1099, in _delete
_check_write_command_response(result)
File "/app/.python2/lib/python2.7/site-packages/pymongo/helpers.py", line 207, in _check_write_command_response
_raise_last_write_error(write_errors)
File "/app/.python2/lib/python2.7/site-packages/pymongo/helpers.py", line 189, in _raise_last_write_error
raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: "Collection metadata changed during operation."
This turned out to be a race condition on the new collection metadata creation.
On every request, collection metadata is fetched with the functionassembleCollectionContext(). This function creates a new collection in case the collection is not already present. Also, the collection context is cached for the sake of performance. When a new collection is created, it's not immediately inserted into the cache, as it is possible transaction might fail. Next request would insert into the cache.
Race condition happens with following steps
A write request on an empty collection creates a new collection in the transaction context but does not insert into the cache
If the request takes too long and non-isolated, NonIsolatedPlan splits the transaction and commits the transaction, which persists new collection metadata
Later before creating new transaction it checks the metadata is still valid, while checking due to the nature of assembleCollectionContext(), it creates new context again. And obviously, it doesn't match the old context that was created. Hence the issue.
Couple of things we need to do for this issue
Change assembleCollectionContext() to only create new collection, only if asked
Only insert and upserts should create new collection
The text was updated successfully, but these errors were encountered:
Not reproducible with any specific seed. Happens occasionally, 1 out of 3000 runs.
apkar
changed the title
Correctness tests occasionally failing with "Collection metadata changed during operation"
New collection creation can fail, by racing with itself on collection metadata
Jan 29, 2019
After looking at the code bit more, I think the better way of dealing with this is by keeping the collection creation in a separate transaction. This will reduce the consistency guarantee if a request, that can create a new collection, fails it may still create the new collection.
In fact, that is still the case with the current code but that happens only with larger requests. Keeping the cluster creation in a separate transaction reduces the corner cases and keeps the code simple. And it's not really breaking any consistency guarantees we didn't provide before.
This turned out to be a race condition on the new collection metadata creation.
On every request, collection metadata is fetched with the function
assembleCollectionContext()
. This function creates a new collection in case the collection is not already present. Also, the collection context is cached for the sake of performance. When a new collection is created, it's not immediately inserted into the cache, as it is possible transaction might fail. Next request would insert into the cache.Race condition happens with following steps
assembleCollectionContext()
, it creates new context again. And obviously, it doesn't match the old context that was created. Hence the issue.Couple of things we need to do for this issue
assembleCollectionContext()
to only create new collection, only if askedThe text was updated successfully, but these errors were encountered: