Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "force" for UrlOperations.delete|upload() #454

Open
mih opened this issue Sep 28, 2023 · 6 comments
Open

Implement "force" for UrlOperations.delete|upload() #454

mih opened this issue Sep 28, 2023 · 6 comments

Comments

@mih
Copy link
Member

mih commented Sep 28, 2023

This is the companion issue for datalad/datalad-ria#103

We need to figure out if and how to deal with upload/delete requests where the associated user has all necessary permissions to successfully complete the operation -- but it would require additional steps (besides the actual upload), e.g. (temporarily) setting a parent directory.

In general, and depending on the protocol, uploads can already be more complex. For example, in some cases it might be desirable for uploads to happen in an atomic fashion (see datalad/datalad-ria#102), which would typically amount to an upload to a tmp location, plus a final rename/move to the actual upload target.

Yet another facet of this is "overwrite-protection". One can have expectations to fail instead of overwriting an existing resource, or to not do that. ATM this is not defined, documented, or uniformly implemented.

@mih
Copy link
Member Author

mih commented Sep 28, 2023

We might need to put a user of UrlOperations handlers in the position to know what they can actually do.

Either we make all uniform, or we make the behavior configurable, or at least queriable.

Example: I am doing an upload

Will it overwrite an existing resource silently, or fail.
If it fails, how?
Is the mitigation to run an explicit delete before, or is there a flag that can enable forcing an overwrite?

If the container is not-writable by default (but could be made writable), will it fail?
If it fails, how?
Is the mitigation to run an explicit <yet-to-be-invented> before, or is there a flag that can enable forcing an write?

Example: I am doing a delete

If something is "protected", should an explicit delete fail, even if the "protection" could be turned off?
If it fails, how?
Is the mitigation to run an explicit <yet-to-be-invented> before, or is there a flag that can enable forcing an deletion?

@christian-monch
Copy link
Contributor

christian-monch commented Oct 4, 2023

TLDR;

I would opt against a forced mode in order to keep the semantics and the set of possible errors of the individual operations simple.

I am not sure, whether we should either let the semantics of the underlying platform determine whether an overwrite is possible, or generally implement on overwrite-protection. The latter might make some UrlOperations implementations less performant.

In detail

We might need to put a user of UrlOperations handlers in the position to know what they can actually do.

Either we make all uniform, or we make the behavior configurable, or at least queriable.
[...]

I would generally opt for a uniform behavior with errors that allow a user to distinguish between causes, i.e. ResourceExists-errors, InsufficientPermission-errors, and other errors.

[...]
Example: I am doing an upload

Will it overwrite an existing resource silently, or fail. If it fails, how? Is the mitigation to run an explicit delete before, or is there a flag that can enable forcing an overwrite?

I think if the upload fails, an explicit delete should be required to keep the API simple. Otherwise, we would introduce additional methods like upload_with_overwrite or flags that modify the semantics of a method like upload(..., force: bool). For example, forced upload might lead to additional errors that can be emitted by upload which indicate that an existing resource could not be overwritten due to missing access rights that did not allow to modify the already existing resource. In my view that makes the usage of upload more complex and less orthogonal. And, unless force is set to True for all invocations, it does not really shrink the decision space of the caller, w.r.t. reacting on an upload-error (more about that in the delete-example below). It does however reduce the amount of user-code that is necessary to deal with an upload error in comparison to an explicit delete-step.

If the container is not-writable by default (but could be made writable), will it fail? If it fails, how? Is the mitigation to run an explicit <yet-to-be-invented> before, or is there a flag that can enable forcing an write?

Example: I am doing a delete

If something is "protected", should an explicit delete fail, even if the "protection" could be turned off? If it fails, how? Is the mitigation to run an explicit <yet-to-be-invented> before, or is there a flag that can enable forcing an deletion?
[...]

I think it might be better to let delete fail and let the user decide how to react to the error. In order for the user to do that properly, the error should be able to indicate that "permissions were insufficient". In the case of this failure, the user's options would be to: a) not perform the delete-operation, or b) disable the protection and execute delete again.

If we would on the other hand provide a force operation, and the user does not set force=True be default (after all, the protection might be there for a reason), and receives a permission error, his options are: a) not perform a forced delete-operation, or b) perform a forced delete-operation.

So the options that provide themselves to the user are not too different in both cases. But if we allow a forced upload or delete, the user might be tempted to always execute those operations in forced mode in order to prevent implementing the decision described above, i.e. whether to redo an operation in forced mode or not.

One other question that comes to mind is the potential necessity of different credentials for remove-operations or access list modification-operations. It would be difficult to take care of that in an "automated" upload- or delete-error-mitigation.

@mih
Copy link
Member Author

mih commented Oct 5, 2023

I think it might be better to let delete fail and let the user decide how to react to the error. In order for the user to do that properly, the error should be able to indicate that "permissions were insufficient". In the case of this failure, the user's options would be to: a) not perform the delete-operation, or b) disable the protection and execute delete again.

This sounds simple, but it not. For the usecase of an ora remote implementation (b) is a standard case. An annex key would be in a write protected key directory. A user running a REMOVE operation in the remote, both certain that they want to remove, and expecting the write protection.

This means that a user needs to be given means to "disable the protection". This implies an extension of the UrlOperations API to allow for that. A "setting permissions" operation is probably of substantial complexity, because it somehow needs to have a meaningful API across protocols and underlying platforms. I would not even know where to start.

Remove a key would then require something like the following algorithm to success (to be implemented by a user):

  • try remove the key file -> fail
  • try obtaining write permissions in parent
  • try remove the key file -> likely succeed
  • try remove the key dir -> likely succeed
  • try remove the 2nd hashdir
  • if that fails, distinguish between lack of permissions and other content errors, and try again
  • report for 1st hashdir

@christian-monch
Copy link
Contributor

[...] An annex key would be in a write protected key directory. [...]

Maybe I don't understand all the modes of access to a ria-store or I am mistaken about the ORA-interface, but why would the key be in a write protected key directory? Do we envision git-annex to have direct access to the files that comprise the RIA-store?

@adswa
Copy link
Member

adswa commented Oct 18, 2023

Here is a file tree and the permissions of the different elements from a random ria store I found on juseless:

adina@juseless in /data/group/psyinf
❱ tree -p dataset_store/946/e8cac-432b-11ea-aac8-f0d5bf7b5561/annex/              
[drwxrws---]  dataset_store/946/e8cac-432b-11ea-aac8-f0d5bf7b5561/annex/
├── [-rw-rw----]  index
├── [-rw-------]  index.lck
├── [drwxrws---]  journal
│   └── [-rw-rw----]  uuid.log
├── [-rw-rw----]  journal.lck
├── [drwxrws---]  objects
│   ├── [drwxrws---]  6q
│   │   └── [drwxrws---]  mZ
│   │       └── [drwxrws---]  MD5E-s93567133--7c93fc5d0b5f197ae8a02e5a89954bc8.flac
│   │           └── [-r--r-----]  MD5E-s93567133--7c93fc5d0b5f197ae8a02e5a89954bc8.flac
│   ├── [drwxrws---]  6v
│   │   └── [drwxrws---]  zK
│   │       └── [drwxrws---]  MD5E-s2043924480--47718be3b53037499a325cf1d402b2be.MTS
│   │           └── [-r--r-----]  MD5E-s2043924480--47718be3b53037499a325cf1d402b2be.MTS
[...]
└── [-rw-rw----]  othertmp.lck

30 directories, 14 files

I think the directories aren't write protected. But I believe git-annex has direct access to most of the files in the RIA store EDIT: I stand corrected, I found datalad/datalad#5668 where Benjamin says that git-annex shouldn't know about about the location of keys, at least for the previous implementation. But I am not sure if this still fits our development targets

@mih
Copy link
Member Author

mih commented Jan 26, 2024

Waiting for #596

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants