-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
During reply download, seeing a lot of "Could not emit reply_download_failed: 'NoResultFound' object has no attribute 'uuid'" errors #1095
Comments
I'm now doing the following to investigate:
to see if the problem arises again at any point during the sync. |
Well, it must have either had something to do with the server state or with what I did during the sync, because on a fresh sync without interacting with the client, I'm not seeing those errors in the logs. |
Do you see more than one This error message is a little confusing but basically it's saying that we can't tell the GUI that the reply failed because the reply no longer exists. So the reply widget will remain in the GUI until we see that the source no longer exists on the server (this will happen within a sync period).
I still don't know why this would cause a CPU spike and make the client sluggish. If you have the full logs, that'll help, and we can try to repro next week. |
I think the following duplicate job scenario would describe the behavior:
|
That reasoning makes sense to me; I can re-test the behavior with deletion during large syncs once #975 lands. Note that I didn't notice any user impact other than temporary sluggishness & CPU spike, which I'm not sure was related to this issue. I'll do another re-sync of the 350+ sources without doing anything else, just to observe the performance characteristics a bit more closely. |
Ran another 350+ sync, CPU for I'm curious if, in addition to the fix in #975, it may be worth removing jobs that have no prospect of succeeding (because a source has been deleted)? Both to avoid unnecessary network traffic, and any unwanted side effects from those jobs running. |
I agree, I think it is worthwhile, we don't have a way to cancel a job yet but I think that'd be a nice architectural improvement after #975 |
if a source is deleted client-side, we already prevent any new jobs on that source. if a sync sees that something server-side has changed for that source (before the job+request to delete it finishes), then those jobs will be added to the main priority queue, after the deletion job. This is handled gracefully, but we could check before adding a job to the queue if the source exists locally first. if it doesn't exist, then we can assume the server just needs to catch up and can drop any new jobs for that source. i think where we might see benefits from this architectural change is if the source is deleted server side. When the client sees that a source has been deleted during a sync, we could iterate through the queue to drop any jobs tied to that source. However a job could be in progress and in that case we'll still log an error. |
true, but since any client-triggered deletion jobs will run before any already enqueued message or reply download jobs, even for deletion triggered locally we can run into the situation where there are pending message/reply downloads left around in the queue for deleted sources |
yup youre right - we could also see benefits from queue cleanup after a source is deleted locally in the case that you are deleting a source when there are a bunch of reply/message downloads in the queue for that source. it's great that we prevent new source jobs from being added, but any existing jobs in the queue will remain until they are processed (where they can decide to skip making the api request). i just want to take a look back at #975 and how we can continue to make improvements in a similar way that keeps our architecture simple and coherent. the PR implements a way to check if a task is already queued up so that we don't enqueue it again. this was relevant because the client enqueues download jobs at the end of a metadata sync for any local db messages/replies where we did discuss at one point waiting until a metadata sync completes all sync tasks before enqueuing another metadata sync job, which would have also solved this issue. the path we're taking with PR 975 is more complicated but it allows us to continue to pick up any new source(s) and show that there are new encrypted messages to download. the other benefit of PR 975 is that we can prevent the user from submitting a job more than once if they click on a button repeatedly. we currently don't have a need for this, but i could see it maybe coming up in the future if we added a refresh button or something like that. so now we're talking about what to do about jobs that are no longer relevant in the queue. what makes a job no longer relevant? if we delete a source, then we know that any jobs in the queue that act on that source (e.g. starring a source) will gracefully fail and the gui will respond accordingly. another scenario might come up if say we allow deletion of messages and also have a "mark as read" feature or something like that. We could drop the "mark as read" job if the message is deleted before it is processed. but how useful is this queue-cleanup feature? the examples i've provided so far are not really an issue because processing them will quickly resolve things, and we can already avoid making api requests if a source/reply/message no longer exists. i think the benefit will only be seen for a very specific edge case: you are syncing the client for the first time and there are hundreds of message and reply download jobs enqueued for that source, and then the source is deleted. also we could prevent a bunch of failure signals from returning to the gui because the jobs would never be processed (we actually might want these failure signals to occur for any user-action jobs that were dropped though). even though i think there is limited value in queue cleanup (please provide more examples if you think i'm wrong) there is an elegant way we can implement this by creating a new high-priority job, perhaps called |
This is still a potential issue. |
I encountered this during QA and I don't have clear steps to reproduce yet. What I've seen is basically the following:
qa_loader.py
client.log
, plus very high CPU usage forsd-app
and temporary sluggishness of the Client.Example log line:
Before those log lines, I see groups of lines like this:
Environment type: Qubes staging
Client version: 20200522 nightly
Workstation version: 0.3.0rc1
Server version: Qubes staging 1.3.0
The text was updated successfully, but these errors were encountered: