-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable retry for deletion operations #676
Comments
Just chatted with @redshiftzero about this issue who pointed out that the deletion job will still be in the queue when the queue is paused and can potentially be processed at a much later time, depending on when network issues are resolved and the MetadataSyncJob succeeds and resumes the queues. Disabling "Retry" for deletion operations doesn't actually remove the failed job from the queue. We also don't remove jobs from the queue based on how long they've been sitting there. In order to avoid the scenario where a deletion happens without the user realizing it, we should:
|
just want to get @rmol's attention on this as well |
I'm missing how the deletion would happen without the user realizing it. The job was created because they chose to delete the source. If the job failed, I think it should be retried until their intent is carried out. But if we require or permit manual retry, that should just work. We don't need to disable retry for deletions, as deletion should be idempotent. This is discussed at the end of section 3.1 in RFC 7232. It's legitimate to respond with a successful status code if the requested state of the resource matches its current state. We could send the Right now the SecureDrop API returns a 404 if the source to be deleted is not found, so that would need to change. |
I see your point about if user says to do it, we should try to do it for as long as it takes, and they shouldn't be surprised by that. It's what we're currently doing. Certain operations like sending a reply are time-sensitive because you might want to change your message if it's been a long time since you attempted to send it, but maybe deleting a source isn't time sensitive and we should retry for as long as possible. My concern is that it might be surprising to retry a job for X minutes or hours or days. That's where I'm coming from.
My point isn't that I'm concerned about idempotent deletions so much as I'm concerned that retrying until the client is closed might be the wrong approach. Sometimes it makes sense for users to manually try again when there have been a certain number of attempts/ enough time has gone by. Sometimes we should accept that a job failed and should be removed from the queue. Again I think this is more important for replies, but it is worth thinking about for deletions and other operations. As far as idempotency goes, I like your suggestion about sending a successful status code if the Source has already been deleted. +1 to that! That's solving a different issue, but I agree. |
@rmol - right now a 404 doesn't cause the queue to pause; we drop jobs that fail for anything other than the api being inaccessible or the request timing out. So if we make a bunch of deletion requests on the same object we'll just get back 404 and move on to the next job in the queue. I'm realizing that if we do decide to switch to a returning 200 or 204, then we will continuously tell the user that the Source has been deleted. It might be best to continue to use 404 and silently move on the way we're currently doing. |
I think it's valid to interpret a 404 from a DELETE request as an indication that the resource is in the desired state. But we just drop jobs if we get 4xx or 5xx from the server? Posting a reply to a deleted source wouldn't result in any kind of error message? |
The plan of record is to remove the explicit Retry (#811) but an implicit retry would still happen after connectivity is restored. I think the way to avoid confusion about the state of the client is to implement a minimal iteration of a pending deletion state (#534) and to prompt the user about unfinished operations (#420). We may want to separately offer a "Cancel" option for pending deletions, very similar to a "cancel reply" option (#810). However, all of these are different ideas than the scope of this issue, which I would recommend closing. |
Closing as per above discussion |
Description
Clicking on "Retry" in the error bar resumes the queue and retries the last failed job. If the job is to delete a Source then we need to either:
This is to avoid the scenario where a user clicks on "Retry" without knowing exactly which task they are retrying and unknowingly deleting a Source.
The text was updated successfully, but these errors were encountered: