Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New collection creation can fail, by racing with itself on collection metadata #51

Closed
2 tasks
apkar opened this issue Jan 28, 2019 · 2 comments
Closed
2 tasks
Assignees
Labels
bug Something isn't working In progress Actively working on the issue
Milestone

Comments

@apkar
Copy link
Contributor

apkar commented Jan 28, 2019

Instance: 449970630051
Traceback (most recent call last):
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 550, in <module>
    okay = ns['func'](ns)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 438, in start_forever_test
    return test_forever(ns)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 378, in test_forever
    (client1, client2, collection1, collection2) = get_clients_and_collections(ns)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/document-correctness.py", line 43, in get_clients_and_collections
    transactional_shim.remove(collection1)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/transactional_shim.py", line 8, in func_wrapper
    ret = func(*args, **kwargs)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/transactional_shim.py", line 47, in func_wrapper
    return func(*args, **kwargs)
  File "/app/deploy/ensembles/20180925-094251-bmuppana-cc417819a7354838/correctness/transactional_shim.py", line 56, in _gen_func
    return getattr(collection, name)(*args, **kwargs)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 2996, in remove
    spec_or_id, multi, write_concern, collation=collation)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 1123, in _delete_retryable
    _delete, session)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1102, in _retryable_write
    return self._retry_with_session(retryable, func, s, None)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1079, in _retry_with_session
    return func(session, sock_info, retryable)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 1119, in _delete
    retryable_write=retryable_write)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/collection.py", line 1099, in _delete
    _check_write_command_response(result)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/helpers.py", line 207, in _check_write_command_response
    _raise_last_write_error(write_errors)
  File "/app/.python2/lib/python2.7/site-packages/pymongo/helpers.py", line 189, in _raise_last_write_error
    raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: "Collection metadata changed during operation."

This turned out to be a race condition on the new collection metadata creation.

On every request, collection metadata is fetched with the function assembleCollectionContext(). This function creates a new collection in case the collection is not already present. Also, the collection context is cached for the sake of performance. When a new collection is created, it's not immediately inserted into the cache, as it is possible transaction might fail. Next request would insert into the cache.

Race condition happens with following steps

  • A write request on an empty collection creates a new collection in the transaction context but does not insert into the cache
  • If the request takes too long and non-isolated, NonIsolatedPlan splits the transaction and commits the transaction, which persists new collection metadata
  • Later before creating new transaction it checks the metadata is still valid, while checking due to the nature of assembleCollectionContext(), it creates new context again. And obviously, it doesn't match the old context that was created. Hence the issue.

Couple of things we need to do for this issue

  • Change assembleCollectionContext() to only create new collection, only if asked
  • Only insert and upserts should create new collection
@apkar apkar self-assigned this Jan 28, 2019
@apkar apkar added bug Something isn't working In progress Actively working on the issue labels Jan 29, 2019
@apkar
Copy link
Contributor Author

apkar commented Jan 29, 2019

Not reproducible with any specific seed. Happens occasionally, 1 out of 3000 runs.

@apkar apkar changed the title Correctness tests occasionally failing with "Collection metadata changed during operation" New collection creation can fail, by racing with itself on collection metadata Jan 29, 2019
@apkar
Copy link
Contributor Author

apkar commented Jan 29, 2019

After looking at the code bit more, I think the better way of dealing with this is by keeping the collection creation in a separate transaction. This will reduce the consistency guarantee if a request, that can create a new collection, fails it may still create the new collection.

In fact, that is still the case with the current code but that happens only with larger requests. Keeping the cluster creation in a separate transaction reduces the corner cases and keeps the code simple. And it's not really breaking any consistency guarantees we didn't provide before.

@apkar apkar removed the In progress Actively working on the issue label Feb 4, 2019
@apkar apkar added this to the 1.7.0 milestone Feb 14, 2019
@apkar apkar added the In progress Actively working on the issue label Feb 15, 2019
apkar added a commit to apkar/fdb-document-layer that referenced this issue Feb 21, 2019
apkar added a commit to apkar/fdb-document-layer that referenced this issue Feb 21, 2019
@apkar apkar closed this as completed in 557c491 Feb 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working In progress Actively working on the issue
Projects
None yet
Development

No branches or pull requests

1 participant