Implement "force" for `UrlOperations.delete|upload()` #454

mih · 2023-09-28T07:31:34Z

This is the companion issue for datalad/datalad-ria#103

We need to figure out if and how to deal with upload/delete requests where the associated user has all necessary permissions to successfully complete the operation -- but it would require additional steps (besides the actual upload), e.g. (temporarily) setting a parent directory.

In general, and depending on the protocol, uploads can already be more complex. For example, in some cases it might be desirable for uploads to happen in an atomic fashion (see datalad/datalad-ria#102), which would typically amount to an upload to a tmp location, plus a final rename/move to the actual upload target.

Yet another facet of this is "overwrite-protection". One can have expectations to fail instead of overwriting an existing resource, or to not do that. ATM this is not defined, documented, or uniformly implemented.

mih · 2023-09-28T14:29:19Z

We might need to put a user of UrlOperations handlers in the position to know what they can actually do.

Either we make all uniform, or we make the behavior configurable, or at least queriable.

Example: I am doing an upload

Will it overwrite an existing resource silently, or fail.
If it fails, how?
Is the mitigation to run an explicit delete before, or is there a flag that can enable forcing an overwrite?

If the container is not-writable by default (but could be made writable), will it fail?
If it fails, how?
Is the mitigation to run an explicit <yet-to-be-invented> before, or is there a flag that can enable forcing an write?

Example: I am doing a delete

If something is "protected", should an explicit delete fail, even if the "protection" could be turned off?
If it fails, how?
Is the mitigation to run an explicit <yet-to-be-invented> before, or is there a flag that can enable forcing an deletion?

christian-monch · 2023-10-04T10:23:23Z

TLDR;

I would opt against a forced mode in order to keep the semantics and the set of possible errors of the individual operations simple.

I am not sure, whether we should either let the semantics of the underlying platform determine whether an overwrite is possible, or generally implement on overwrite-protection. The latter might make some UrlOperations implementations less performant.

In detail

We might need to put a user of UrlOperations handlers in the position to know what they can actually do.

Either we make all uniform, or we make the behavior configurable, or at least queriable.
[...]

I would generally opt for a uniform behavior with errors that allow a user to distinguish between causes, i.e. ResourceExists-errors, InsufficientPermission-errors, and other errors.

[...]
Example: I am doing an upload

Will it overwrite an existing resource silently, or fail. If it fails, how? Is the mitigation to run an explicit delete before, or is there a flag that can enable forcing an overwrite?

I think if the upload fails, an explicit delete should be required to keep the API simple. Otherwise, we would introduce additional methods like upload_with_overwrite or flags that modify the semantics of a method like upload(..., force: bool). For example, forced upload might lead to additional errors that can be emitted by upload which indicate that an existing resource could not be overwritten due to missing access rights that did not allow to modify the already existing resource. In my view that makes the usage of upload more complex and less orthogonal. And, unless force is set to True for all invocations, it does not really shrink the decision space of the caller, w.r.t. reacting on an upload-error (more about that in the delete-example below). It does however reduce the amount of user-code that is necessary to deal with an upload error in comparison to an explicit delete-step.

If the container is not-writable by default (but could be made writable), will it fail? If it fails, how? Is the mitigation to run an explicit <yet-to-be-invented> before, or is there a flag that can enable forcing an write?

Example: I am doing a delete

If something is "protected", should an explicit delete fail, even if the "protection" could be turned off? If it fails, how? Is the mitigation to run an explicit <yet-to-be-invented> before, or is there a flag that can enable forcing an deletion?
[...]

I think it might be better to let delete fail and let the user decide how to react to the error. In order for the user to do that properly, the error should be able to indicate that "permissions were insufficient". In the case of this failure, the user's options would be to: a) not perform the delete-operation, or b) disable the protection and execute delete again.

If we would on the other hand provide a force operation, and the user does not set force=True be default (after all, the protection might be there for a reason), and receives a permission error, his options are: a) not perform a forced delete-operation, or b) perform a forced delete-operation.

So the options that provide themselves to the user are not too different in both cases. But if we allow a forced upload or delete, the user might be tempted to always execute those operations in forced mode in order to prevent implementing the decision described above, i.e. whether to redo an operation in forced mode or not.

One other question that comes to mind is the potential necessity of different credentials for remove-operations or access list modification-operations. It would be difficult to take care of that in an "automated" upload- or delete-error-mitigation.

mih · 2023-10-05T07:20:23Z

I think it might be better to let delete fail and let the user decide how to react to the error. In order for the user to do that properly, the error should be able to indicate that "permissions were insufficient". In the case of this failure, the user's options would be to: a) not perform the delete-operation, or b) disable the protection and execute delete again.

This sounds simple, but it not. For the usecase of an ora remote implementation (b) is a standard case. An annex key would be in a write protected key directory. A user running a REMOVE operation in the remote, both certain that they want to remove, and expecting the write protection.

This means that a user needs to be given means to "disable the protection". This implies an extension of the UrlOperations API to allow for that. A "setting permissions" operation is probably of substantial complexity, because it somehow needs to have a meaningful API across protocols and underlying platforms. I would not even know where to start.

Remove a key would then require something like the following algorithm to success (to be implemented by a user):

try remove the key file -> fail
try obtaining write permissions in parent
try remove the key file -> likely succeed
try remove the key dir -> likely succeed
try remove the 2nd hashdir
if that fails, distinguish between lack of permissions and other content errors, and try again
report for 1st hashdir

christian-monch · 2023-10-05T08:07:21Z

[...] An annex key would be in a write protected key directory. [...]

Maybe I don't understand all the modes of access to a ria-store or I am mistaken about the ORA-interface, but why would the key be in a write protected key directory? Do we envision git-annex to have direct access to the files that comprise the RIA-store?

adswa · 2023-10-18T11:58:08Z

Here is a file tree and the permissions of the different elements from a random ria store I found on juseless:

adina@juseless in /data/group/psyinf
❱ tree -p dataset_store/946/e8cac-432b-11ea-aac8-f0d5bf7b5561/annex/              
[drwxrws---]  dataset_store/946/e8cac-432b-11ea-aac8-f0d5bf7b5561/annex/
├── [-rw-rw----]  index
├── [-rw-------]  index.lck
├── [drwxrws---]  journal
│   └── [-rw-rw----]  uuid.log
├── [-rw-rw----]  journal.lck
├── [drwxrws---]  objects
│   ├── [drwxrws---]  6q
│   │   └── [drwxrws---]  mZ
│   │       └── [drwxrws---]  MD5E-s93567133--7c93fc5d0b5f197ae8a02e5a89954bc8.flac
│   │           └── [-r--r-----]  MD5E-s93567133--7c93fc5d0b5f197ae8a02e5a89954bc8.flac
│   ├── [drwxrws---]  6v
│   │   └── [drwxrws---]  zK
│   │       └── [drwxrws---]  MD5E-s2043924480--47718be3b53037499a325cf1d402b2be.MTS
│   │           └── [-r--r-----]  MD5E-s2043924480--47718be3b53037499a325cf1d402b2be.MTS
[...]
└── [-rw-rw----]  othertmp.lck

30 directories, 14 files

I think the directories aren't write protected. But I believe git-annex has direct access to most of the files in the RIA store EDIT: I stand corrected, I found datalad/datalad#5668 where Benjamin says that git-annex shouldn't know about about the location of keys, at least for the previous implementation. But I am not sure if this still fits our development targets

mih · 2024-01-26T07:55:08Z

Waiting for #596

mih mentioned this issue Sep 28, 2023

Document the challenge to implement RIA store cleanups datalad/datalad-ria#104

Open

mih mentioned this issue Sep 28, 2023

RIA-specific URL handlers datalad/datalad-ria#105

Open

mih mentioned this issue Nov 23, 2023

Add capability introspection to url operations framework #542

Open

christian-monch mentioned this issue Mar 27, 2024

Atomic UrlOperations.upload for ssh- and file-URLs #649

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement "force" for `UrlOperations.delete|upload()` #454

Implement "force" for `UrlOperations.delete|upload()` #454

mih commented Sep 28, 2023 •

edited

Loading

mih commented Sep 28, 2023 •

edited

Loading

christian-monch commented Oct 4, 2023 •

edited

Loading

mih commented Oct 5, 2023

christian-monch commented Oct 5, 2023

adswa commented Oct 18, 2023 •

edited

Loading

mih commented Jan 26, 2024

Implement "force" for UrlOperations.delete|upload() #454

Implement "force" for UrlOperations.delete|upload() #454

Comments

mih commented Sep 28, 2023 • edited Loading

mih commented Sep 28, 2023 • edited Loading

christian-monch commented Oct 4, 2023 • edited Loading

TLDR;

In detail

mih commented Oct 5, 2023

christian-monch commented Oct 5, 2023

adswa commented Oct 18, 2023 • edited Loading

mih commented Jan 26, 2024

Implement "force" for `UrlOperations.delete|upload()` #454

Implement "force" for `UrlOperations.delete|upload()` #454

mih commented Sep 28, 2023 •

edited

Loading

mih commented Sep 28, 2023 •

edited

Loading

christian-monch commented Oct 4, 2023 •

edited

Loading

adswa commented Oct 18, 2023 •

edited

Loading