-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure UI remains responsive at all times during sync with server. #733
Conversation
this is almost certainly due to this logic: after sync, we attempt to re-import every key for every source (from the server). If we rework the logic on the client-side to only import keys for sources that do not yet have keys (to do in a followup) then that should significantly reduce CPU usage |
So currently we call
Since we want to display new messages and replies and new sources on-the-fly, we refresh the source list every time we download something new that needs to be displayed. I think we should think about removing this call from more than just the
It would makes sense why removing the GUI update of the source list for each new message would speed this up.
so currently we call sync_api in a few more places (the big removal of sync_api is not yet fisnished). I am not seeing 20-second interval metadata syncs unless I change this timer to be 20 seconds: https://github.com/freedomofpress/securedrop-client/blob/master/securedrop_client/logic.py#L227 One way to see more frequent API calls is if there is a failure of a queue job, which will retry. The default timeout is 20 seconds for messages and replies, so perhaps there was so retrying happening.
This is something we have talked about and could do on the server-side, which seems ideal, but until then we could improve the client side of things by only updating the GUI source list after we get all the sources, not whenever there is a new message or reply (because we will already have the source for that message in the source list (recall that we update the source list after a metadata sync).
I do believe changing the code so that we only call |
Note: that was the thinking behind this but actually it's unnecessary as you've discovered @ntoll |
@ntoll I tried with 200 sources and the GPG imports only took a few seconds, and weren't killing my machine, but this branch skips the import for keys we already have. It's based on this branch ( |
@rmol @creviera OK... I've merged you skip-gpg branch and removed the update_sources from various places. I'm about to embark on testing against a 200-source version of the server and checking the various aspects of the behaviour I've inevitably changed. I also want to update a few other things to make sure that when new messages arrive the source preview is updated properly. |
This has been an interesting "building-a-house-of-cards" like exercise to get to this state: work slowly, carefully, but one wrong move and it all falls down. All the tests were done against a server with 200 sources. This is currently still a WiP but I feel I've made good progress both in terms of diagnosing and understanding the problem from the client side of things and then coming up with a fix. The retries after 20 seconds that I noticed were because the MetadataJob had an API call timeout of 20 seconds which wasn't long enough for the file to download (on my machine). I'm not sure why this is the case on my machine, but the side effect is that the job would always fail (time-out) and then try again and I'd be stuck in that state of never completed syncs. This was what was causing the CPU load. I've increased the timeout value and the sync successfully completes. Then I noticed that while the UI was responsive in that it wasn't frozen, it didn't look as if it were doing anything while the sources were being downloaded. So I have bodged an animation together which will definitely need the magical super powers of @ninavizz to provide a 16x16 looping animated gif of an indicator to show the client is doing something. I also ensured that the status message was updated to indicate things were "in flight". This, too, needs checking in terms of UI / messaging. However, and please remember this is a temporary UI bodge, this vastly improves the UX in terms of responsiveness. As you can see, the UI is always responsive. In addition I also managed to crash the client several times and eventually found that was due to some faulty login in the updated Finally, as messages, files and replies are downloaded from the server the source list "preview last message" also needs updating. This is covered in another ticket (#707) but it feels related to all this work. I won't fix it here but once this is merged I want to nail that down now that I've become quite familiar with the code I'll be changing through the work done in this PR. ;-) I'd like feedback, poking with a stick and comments about the content of this PR (remember it's got @rmol's branch for gpg key import improvements merged in too). My aim is, assuming folks don't find gotchas, to get this PR revised with comprehensive unit tests (the work you see so far is the result of slow/careful exploration so is a bit like a "spike" at the moment) and incorporate an appropriate spinner from @ninavizz. Phew. |
OK... I need to revisit this work in light of #732 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you just need to add tests and remove code unrelated to the removal of unnecessary GUI updates and the client freeze issue.
other than that, i like this approach
@@ -53,7 +53,7 @@ def call_api(self, api_client: API, session: Session) -> Any: | |||
# TODO: Once https://github.com/freedomofpress/securedrop-client/issues/648, we will want to | |||
# pass the default request timeout to api calls instead of setting it on the api object | |||
# directly. | |||
api_client.default_request_timeout = 20 | |||
api_client.default_request_timeout = 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you'll notice we have an issue for acquiring realistic timeouts for api calls here: #648
if you read through john's findings you'll see that 20 seconds is cutting it close for get_sources
when there's 200, but it looks to be more than enough time for the other API calls made during a sync: get_submissions
and get_all_replies
eventually we want to be able to scale these timeouts the way we do with file downloads but for now, using a default of 20 seconds seems like plenty of time for most cases except when there are 200+ sources for get_sources
Unfortunately we don't yet support passing a different default timeout to just the get_sources
api method as you can see from my comment above:
# TODO: Once https://github.com/freedomofpress/securedrop-client/issues/648, we will want to
# pass the default request timeout to api calls instead of setting it on the api object
# directly.
i think until we can vary our timeouts per request and based on size of the request, we should use the largest measurement from john's profiling and mutliple it by 1.5 or 2, so I think 30-40 second timeouts should be enough, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if @ntoll tested with freedomofpress/securedrop#5100 in place on the server which should already significantly improve performance (the cache gets primed on app server start)
securedrop_client/gui/widgets.py
Outdated
@@ -215,6 +215,9 @@ def __init__(self): | |||
|
|||
self.active = False | |||
|
|||
self.download_animation = load_movie("sync_animation.gif") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed during standup you can just remove all these widget.py changes and focus on optimizing the UI so that it doesn't freeze when there are many sources by removing all the unnecessary update_sources calls and increasing the default timeout a bit.
securedrop_client/queue.py
Outdated
@@ -103,7 +103,12 @@ def re_add_job(self, job: ApiJob) -> None: | |||
''' | |||
job.remaining_attempts = DEFAULT_NUM_ATTEMPTS | |||
priority = self.JOB_PRIORITIES[type(job)] | |||
self.queue.put_nowait((priority, job)) | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is out of scope and can be made into a separate pr, right? or was this needed for optimizing UI during a sync? also see how we are moving towards a continuous background sync that only ever runs one sync at a time: #739
OK... I've extracted just the essential aspect of this fix, rebased/reset things and wrapped it all up into a neat single commit. I'll add another PR to address the queue related bugfix. NOTE: this branch does not include @rmol's gpg key based branch any more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Description
Fixes #716
DO NOT MERGE this is a work in progress while further investigation takes place.
The initial removal of
update_sources
on message download ensures the UI remains responsive.However, given a server with 200 sources, while the UI is now responsive, I no longer see the results I was able to see with far fewer sources (50) in previous tests. The server activity looks like this:
Which suggests something is pinging the API every 20 seconds or so. Not sure what though. I also notice from
top
that the client is taking up around 50% of my CPU and spawning lots ofgpg2
related processes. After a while the client (responsively!) reports "The SecureDrop server cannot be reached".As I understand it, we download everything in one go. Is there an API call we can make to get just the sources with their latest message id and just download them instead. Upon clicking the source, if not already downloaded, perhaps only the messages for the specific source could be downloaded and processed..?
In any case, the end result is that the client responsively hangs and takes up CPU resources when there are a large number of sources. When there are only 50 the application appears to work fine except the latest-message preview in the source list is never updated (which can be coded around, see screenie below).
My gut feeling is while the UI is now responsive, this highlights yet another performance problem -- i.e. the processing of large numbers of sources and their associated conversations.
Thoughts welcome.
Test Plan
Currently, keep poking with a stick and ask more questions.
Checklist
If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:
If these changes add or remove files other than client code, packaging logic (e.g., the AppArmor profile) may need to be updated. Please check as applicable: