Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in creating directories on PUT #114

Open
paulmillar opened this issue Jan 4, 2019 · 6 comments
Open

Race condition in creating directories on PUT #114

paulmillar opened this issue Jan 4, 2019 · 6 comments
Assignees
Labels

Comments

@paulmillar
Copy link
Contributor

PutHandler contains the following code:

                Resource r = parent.child(path.getName());

                if (r == null) {
                        log.info("Could not find child: " + path.getName() + " in parent: " + parent.getName() + " - " + parent.getClass());
                        if (parent instanceof MakeCollectionableResource) {
                                MakeCollectionableResource mkcol = (MakeCollectionableResource) parent;
                                if (!handlerHelper.checkAuthorisation(manager, mkcol, request)) {
                                        throw new NotAuthorizedException(mkcol);
                                }
                                log.info("autocreating new folder: " + path.getName());
                                CollectionResource newCol = mkcol.createCollection(path.getName());
                                manager.getEventManager().fireEvent(new NewFolderEvent(newCol));
                                return newCol;
                        } else {
                                log.info("parent folder isnt a MakeCollectionableResource: " + parent.getName() + " - " + parent.getClass());
                                return null;
                        }
                } else if (r instanceof CollectionResource) {

The problem is that, if there are many concurrent requests targeting the same non-existing path then it is possible for a collection that does not exist for parent.child(path.getName()) to be created by the time mkcol.createCollection(path.getName()) is called.

RFC 4918 is a little vague on how MKCOL should react if the entity already exists: it says the request must fail, but does not indicate with which status code. I believe 400 Bad Request (i.e., BadRequestException) is the expected response, despite not being listed in RFC 4918 § 9.3.1. The MakeCollectionableResource interface supports BadRequestException but the JavaDoc doesn't describe the expectation (i.e., when this exception is to be thrown).

Assuming BadRequestException is the expected reaction, one way of fixing this would be to catch this exception and retry the parent.child(path.getName()) request.

@paulmillar
Copy link
Contributor Author

Note that the same problem exists in MkColHandler:

		Resource existingChild = existingCol.child(newName);
		if (existingChild != null) {
		        // ...
			responseHandler.respondMethodNotAllowed(existingChild, response, request);
			return;
		}
		CollectionResource made = creator.createResource(existingCol, newName, request);

Multiple concurrent MKCOL requests targeting the same missing collection could all see a null value for existingChild, so multiple threads will attempt to call CollectionResourceCreator#createResource, which (by default), calls MakeCollectionableResource#createCollection.

Interestingly, the code here returns status code 405 Method Not Allowed (which I see now is actually the correct code).

Therefore, I believe the correct solution here is to add a new throwable exception in the MakeCollectionableResource#createCollection method's signature to indicate the resource already exists, and update MkColHandler and PutHandler to react correctly to that exception.

@bradmac bradmac self-assigned this Jan 6, 2019
@bradmac
Copy link
Contributor

bradmac commented Jan 6, 2019

Assuming BadRequestException is the expected reaction, one way of fixing this would be to catch this exception and retry the parent.child(path.getName()) request.

I think that would be exceeding the scope of milton's functionality. Milton isnt responsible for transaction handling/isolation etc.

Concurrency is intended to be handled in webdav through locking. Most operating system clients use an elaborate sequence of actions to ensure concurrency, eg create a zero byte file with a temp name, lock it, upload to it, then move it to the desired name.

In addition there is a feature defined in the webdav spec referred to as the 'lock null' process where a resource can be locked before it exists. Milton fully supports lock-null, but i understand Mac Finder is the only OS client which supports it.

@bradmac
Copy link
Contributor

bradmac commented Jan 6, 2019

I think if you have an implementation with concurrency problems the best way to handle it is

  • ensure you support lock-null (if Finder is a supported client)
  • hold a list of pending resource creation operations, and use that to validate resource names when a create request is received. This in effect intentionally breaks transaction isolation for this case.

@bradmac bradmac assigned paulmillar and unassigned bradmac Jan 6, 2019
@paulmillar
Copy link
Contributor Author

Thanks for the reply, but I believe we are talking somewhat at cross-purposes.

I appreciate that WebDAV support locking; however here I am interested in how Milton supports multiple clients, each issuing a single PUT request.

To give you a concrete example: I discovered this problem when attempting to use JMeter to benchmark performance characteristics. My simple test-case has JMeter with a threadgroup (=> simulating multiple clients), where each is uploading a unique file: thread-1 uploads to /public/jmeter/thread-1.dat, thread-2 uploads to /public/jmeter/thread-2.dat, etc.

Each client issues a PUT request at (more or less) the same time, as the test-case starts. This means that the jetty server will process these PUT requests will concurrently, using different threads.

The directory /public/jmeter does not exist before the test starts. Multiple jetty/milton threads notice parent.child(path.getName()) returning null (for a /public parent and jmeter as the name). Therefore, these multiple threads all call createCollection. Naturally, only the first thread will succeed, and all other threads will fail.

Thanks for the hint about lock-null -- I'll check that dCache supports this.

However, we have clients that do not lock all resources in the path (/, /public /public/jmeter, /public/jmeter/thread-1.dat) before issuing a PUT request, so I believe the lock-null support would likely have limited effect.

I also don't believe your second solution really solves the problem for a few reasons:

  1. holding the list of pending creations doesn't fix the race condition, it just reduces the window (makes it less likely)
  2. a typical production deployment has multiple servers. This list of pending creations would need to be synchronised across all such servers. Achieving agreement reliably across multiple servers is expensive in terms of performance (see algorithms like PAXOS, RAFT; and software like ZooKeeper).

My proposed solution is relatively simple and backwards compatible: update the MakeCollectionableResource interface to allow the method to throw an exception if the resource already exists.

@bradmac
Copy link
Contributor

bradmac commented Jan 7, 2019

I think i do understand what you mean. But the example above is invalid. For your JMeter script to be correct it should use locking as dav clients are required to do.

Milton's semantics are correct, in that if a resource exists we respond with a 405 as above. Milton tries to take on the burden of complying with protocol requirements rather then imposing that on implmentation developers. So we want to retain the current functionality where we check to see if a resource exists before calling the createCollection method.

I dont believe that persisting state across clusters is impractical. There are many good tools for doing this. In fact you need to do this with locks (including lock-null), and you should do it for nonces, for milton to work correctly in a clustered environment.

So to summarise

  • i think its reasonable for a server to assume clients will correctly implement locking
  • and you can optimise your solution by maintaining a list of in-progress, but uncommited, operations.

@bradmac
Copy link
Contributor

bradmac commented Jan 7, 2019

Also, you can configure the milton stack to use your own MkColHandler, which could remove the check to see whether the resource exists. Milton is designed to be pluggable for this reason.

paulmillar added a commit to dCache/dcache that referenced this issue Jan 23, 2019
Motivation:

When processing a PUT request, milton checks whether parent paths exist
and creates them if not.  Unfortunately, this process is racy:
concurrent PUTs that target the same missing directory risk concurrent
createCollection method calls with the same target.  If this happens all
but one createCollection call will fail, resulting in the corresponding
PUT requests failing.

The problem is described here:

    miltonio/milton2#114

Modification:

Add a work-around where dCache pretends the createCollection call is
successful, allowing the PUT request to succeed.

Result:

Fix a problem where all but one requests fail, if multiple concurrent
PUT requests have directories in the path that do not already exist.

Target: master
Requires-notes: yes
Requires-book: no
Request: 5.0
Request: 4.2
Request: 4.1
Request: 4.0
Request: 3.2
Patch: https://rb.dcache.org/r/11480/
Acked-by: Olufemi Adeyemi
paulmillar added a commit to paulmillar/dcache that referenced this issue Jan 23, 2019
Motivation:

When processing a PUT request, milton checks whether parent paths exist
and creates them if not.  Unfortunately, this process is racy:
concurrent PUTs that target the same missing directory risk concurrent
createCollection method calls with the same target.  If this happens all
but one createCollection call will fail, resulting in the corresponding
PUT requests failing.

The problem is described here:

    miltonio/milton2#114

Modification:

Add a work-around where dCache pretends the createCollection call is
successful, allowing the PUT request to succeed.

Result:

Fix a problem where all but one requests fail, if multiple concurrent
PUT requests have directories in the path that do not already exist.

Target: master
Requires-notes: yes
Requires-book: no
Request: 5.0
Request: 4.2
Request: 4.1
Request: 4.0
Request: 3.2
Patch: https://rb.dcache.org/r/11480/
Acked-by: Olufemi Adeyemi
paulmillar added a commit to paulmillar/dcache that referenced this issue Jan 23, 2019
Motivation:

When processing a PUT request, milton checks whether parent paths exist
and creates them if not.  Unfortunately, this process is racy:
concurrent PUTs that target the same missing directory risk concurrent
createCollection method calls with the same target.  If this happens all
but one createCollection call will fail, resulting in the corresponding
PUT requests failing.

The problem is described here:

    miltonio/milton2#114

Modification:

Add a work-around where dCache pretends the createCollection call is
successful, allowing the PUT request to succeed.

Result:

Fix a problem where all but one requests fail, if multiple concurrent
PUT requests have directories in the path that do not already exist.

Target: master
Requires-notes: yes
Requires-book: no
Request: 5.0
Request: 4.2
Request: 4.1
Request: 4.0
Request: 3.2
Patch: https://rb.dcache.org/r/11480/
Acked-by: Olufemi Adeyemi
paulmillar added a commit to paulmillar/dcache that referenced this issue Jan 23, 2019
Motivation:

When processing a PUT request, milton checks whether parent paths exist
and creates them if not.  Unfortunately, this process is racy:
concurrent PUTs that target the same missing directory risk concurrent
createCollection method calls with the same target.  If this happens all
but one createCollection call will fail, resulting in the corresponding
PUT requests failing.

The problem is described here:

    miltonio/milton2#114

Modification:

Add a work-around where dCache pretends the createCollection call is
successful, allowing the PUT request to succeed.

Result:

Fix a problem where all but one requests fail, if multiple concurrent
PUT requests have directories in the path that do not already exist.

Target: master
Requires-notes: yes
Requires-book: no
Request: 5.0
Request: 4.2
Request: 4.1
Request: 4.0
Request: 3.2
Patch: https://rb.dcache.org/r/11480/
Acked-by: Olufemi Adeyemi
paulmillar added a commit to paulmillar/dcache that referenced this issue Jan 23, 2019
Motivation:

When processing a PUT request, milton checks whether parent paths exist
and creates them if not.  Unfortunately, this process is racy:
concurrent PUTs that target the same missing directory risk concurrent
createCollection method calls with the same target.  If this happens all
but one createCollection call will fail, resulting in the corresponding
PUT requests failing.

The problem is described here:

    miltonio/milton2#114

Modification:

Add a work-around where dCache pretends the createCollection call is
successful, allowing the PUT request to succeed.

Result:

Fix a problem where all but one requests fail, if multiple concurrent
PUT requests have directories in the path that do not already exist.

Target: master
Requires-notes: yes
Requires-book: no
Request: 5.0
Request: 4.2
Request: 4.1
Request: 4.0
Request: 3.2
Patch: https://rb.dcache.org/r/11480/
Acked-by: Olufemi Adeyemi
paulmillar added a commit to paulmillar/dcache that referenced this issue Jan 23, 2019
Motivation:

When processing a PUT request, milton checks whether parent paths exist
and creates them if not.  Unfortunately, this process is racy:
concurrent PUTs that target the same missing directory risk concurrent
createCollection method calls with the same target.  If this happens all
but one createCollection call will fail, resulting in the corresponding
PUT requests failing.

The problem is described here:

    miltonio/milton2#114

Modification:

Add a work-around where dCache pretends the createCollection call is
successful, allowing the PUT request to succeed.

Result:

Fix a problem where all but one requests fail, if multiple concurrent
PUT requests have directories in the path that do not already exist.

Target: master
Requires-notes: yes
Requires-book: no
Request: 5.0
Request: 4.2
Request: 4.1
Request: 4.0
Request: 3.2
Patch: https://rb.dcache.org/r/11480/
Acked-by: Olufemi Adeyemi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants