Clean up threading #77

joshuathayer · 2018-10-26T23:47:04Z

This fixes our API request timeout handling, and in the process tries to clean up how we handle our API thread in general.

As a quick overview: in order to retain application interactivity while we do remote API calls, we run those API calls in a separate thread (as provided by Qt's QThread) from our main UI thread. That's a common and recommended pattern.

Specifically, we have a logic.Client class, which is responsible for spawning and managing an API thread. Client holds a ref to an client.APICallRunner object, which represents the remote API request. When it's time to run the request, the Client spawns a new thread, "moves" its APICallRunner object to the new thread, and starts the new thread. The APICallRunner's call_api method does the actual remote request, and its thread is blocked while that request runs.

The Client is responsible for managing that thread: it needs to have methods to handle successes and failures, including timing out.

The first issue this PR addresses was that we were started a QTimer in the APICallRunner thread, setting its timeout handler to a function in that thread. But since that thread is blocked while the remote request is running, it has no way of handling that callback until the remote request returns. This led to a bug where both the success and timeout handlers were being run: the request would take longer than 5 seconds and then eventually succeed. In the meantime, the timer would arrange for the timeout handler to be run on the API thread. When the remote request returned and the thread became unblocked, the success handler would be run (since the request succeeded), and also the timeout handler would run (since the timer arranged for that to happen).

I moved the timer and its handler into the Client object and its thread. When the timer runs out, we abandon it and its APICallRunner object (from the perspective of Client), but the thread is still running and will happily call its success callback when it eventually gets unblocked. So, we set a flag in the CallRunner object telling it that it's run out of time, and should silently do nothing when it eventually becomes unblocked.

Next, this tidies up our callbacks a bit, by handling Client-object state changes after request/failure in Client-wide functions, rather than in individual callbacks. After every request, we want to reset the client to an "empty" state: no APICallRunner object, no API thread, no callbacks registered. There are slightly different ways to do this on success and on failure. Rather than duplicating that logic in every callback handler, this PR now wraps the user-specific callback in a small function which handles the cleanup on its own.

Note that this involves a small change to how callbacks get to any response data. Previously, they'd look at self.api_runner.result. In this PR, we reset self.api_runner before the callback is run, so self.api_runner.result will never be defined while in the callback. Rather, the result is passed as an argument to the callback function directly, so callback signatures should be of the form

def foo_callback(self, result, result_data):

where result is a boolean, and result_data is the actual data returned from the API (previously found at self.api_runner.result)

This change allows us to greatly simplify how the client is reset between requests, and reduces the responsibilities of the success/failure callbacks (or, if you've consumed the same koolaide as me, it decomplects those responsibilities).

Multitheaded stuff is notoriously hard to test and to reason about. I only found these problems because I'm developing against a remote SD instance and was running in to timeout issues. I've manually tested various success and failure cases, but I suspect that there are still race conditions and cases that cause unexpected behavior... though I do think this puts us in a better state than before?

Also this work originated as part of my branch adding message loading. I manually pulled out this threading stuff, but there are some changes remaining here which have to do with message loading still... I'll to clean those up shortly.

joshuathayer · 2018-10-26T23:47:55Z

securedrop_client/logic.py

@@ -47,33 +47,29 @@ def __init__(self, api_call, *args, **kwargs):
        self.args = args
        self.kwargs = kwargs
        self.result = None
+        self.i_timed_out = False


Flag telling the API thread that it's been timed out, and when it finally gets unblocked it should Do Nothing.

joshuathayer · 2018-10-26T23:48:22Z

securedrop_client/logic.py

-        self.timer = QTimer()
-        self.timer.timeout.connect(lambda: self.timeout.emit())
-        self.timer.setSingleShot(True)
-        self.timer.start(5000)


All the timer stuff moved to the Client object.

joshuathayer · 2018-10-26T23:50:08Z

securedrop_client/logic.py

        self.sync_flag = os.path.join(home, 'sync_flag')
+        self.home = home  # The "home" directory for client files.
+        self.data_dir = os.path.join(self.home, 'data')  # File data.
+        self.timer = None # call timeout timer


The Client object now has a timer, to detect API calls which take too long. The home stuff above is unrelated to the threading changes.

joshuathayer · 2018-10-26T23:51:38Z

securedrop_client/logic.py

+
+            # we've started a timer. when that hits zero, call our
+            # timeout function
+            self.timeout_api_call.connect(lambda: self.timeout_cleanup(timeout))


Rather than triggering the user-specified callbacks (for success and timeouts) directly, we wrap them in our own functions which will clean up the client, then call the callback functions.

joshuathayer · 2018-10-26T23:52:29Z

securedrop_client/logic.py

            self.api_thread.start()

+        else:
+            logger.info("There's already an API request running, so I'm not going to ignore this one (XXX this may not be the coolest thing to do...)")
+


As the log message implies, we should consider how we actually want to handle "concurrent" API requests...

joshuathayer · 2018-10-26T23:54:15Z

securedrop_client/logic.py

            self.api_runner = None
            self.api_thread = None
+            self.timer = None


call_reset is called by our callback wrappers. We take care to disconnect the timeout handler callbacks: if we don't, then we'll just add more and more callbacks over time, and when one call finally times out, all collected timeout handlers will get triggered (even for requests that have long-since completed).

joshuathayer · 2018-10-26T23:55:16Z

securedrop_client/logic.py

            self.api_thread = QThread(self.gui)
            self.api_runner = APICallRunner(function, *args, **kwargs)
            self.api_runner.moveToThread(self.api_thread)
-            self.api_runner.call_finished.connect(callback)
-            self.api_runner.timeout.connect(timeout)
-            self.finish_api_call.connect(self.api_runner.on_cancel_timeout)


This is handled in our callback wrapper now.

joshuathayer · 2018-10-26T23:55:36Z

securedrop_client/logic.py

            self.api_thread.started.connect(self.api_runner.call_api)
-            self.api_thread.finished.connect(self.call_reset)


This is also handled in our callback wrapper.

joshuathayer · 2018-10-26T23:56:27Z

securedrop_client/logic.py


    def login(self, username, password, totp):
        """
        Given a username, password and time based one-time-passcode (TOTP),
        create a new instance representing the SecureDrop api and authenticate.
        """
-        self.api = sdclientapi.API(self.hostname, username, password, totp)
+
+        self.api = sdclientapi.API(self.hostname, username, password, totp, proxy=True)


Ah, this proxy=True will probably break things for folks who are not running on Qubes, I think... Hmm.

yeah we can snip this out for review on non-Qubes OSes (cc #2)

joshuathayer · 2018-10-26T23:56:57Z

securedrop_client/logic.py

        """
        Handles the result of an authentication call against the API.
        """
-        self.call_reset()


Now handled in our callback wrapper

joshuathayer · 2018-10-26T23:57:19Z

securedrop_client/logic.py

+        """
+        self.timer.stop()
+
+    def on_authenticate(self, result, result_data):


note the new call signature, adding result_data (unused in this method, as it turns out)

joshuathayer · 2018-10-26T23:58:02Z

securedrop_client/logic.py

+            self.call_reset()
+
+        user_callback()
+


Here are our callback wrappers, which reset the state of the Client before calling user callbacks.

joshuathayer · 2018-10-26T23:58:51Z

securedrop_client/logic.py

+                          self.on_sync_timeout, self.api)
+            logger.info("In sync_api, after call to call_api, I'm in thread {}".format(self.thread().currentThreadId()))
+        else:
+            logger.info("It looks like you are not authenticated. Weird, isn't it.")


This was debugging output and should be removed

joshuathayer · 2018-10-26T23:59:34Z

securedrop_client/logic.py

+        Called when downloading a file has timed out.
+        """
+        # Update the status bar to indicate a failure state.
+        self.set_status("Connection to server timed out, please try again.")


This is leftover from my "message" branch and should be removed from this PR.

ntoll · 2018-10-29T14:14:45Z

@joshuathayer epic stuff and a much needed refactor.

Regarding QTimer... my understanding was that it was something that couldn't be blocked (i.e. Qt handled this automagically when the QTimer object was started). But I see from the docs, that we'd need to add an event loop to the thread (http://doc.qt.io/qt-5/qtimer.html#details), or do something else (as you've done) so good catch there!

Everything else you've done makes complete sense to me and I like how we're simplifying (or should that be decomplectifying, ;-) ?) . In any case, this completely resonates with the "just build something that works, then refactor when we know more" philosophy of my first pass at this. If we can wrap all the thread-y stuff in this way, it'll make any future API calls much easier.

Also, see my comments in #80 about the signal not indicating success/fail via a bool so we fix #71. Could this be wrapped into this PR..? Happy to do this work if need be.

redshiftzero · 2018-10-30T00:14:14Z

I started making minor changes to this (tests/linting fixes) so I could test and merge... but then I just remembered since this PR was proposed we added the use of current_object to keep track of which object is relevant in some (not all) callbacks for the file download logic. so I'm going to start making that update and add here since we need it here also to not break master (I think #71 we should do in a followup since we don't strictly need it -- smaller PRs are best PRs)

redshiftzero · 2018-10-30T03:41:40Z

securedrop_client/logic.py

@@ -67,7 +68,8 @@ def call_api(self):
        if self.i_timed_out is False:
            self.call_finished.emit(result_flag)
        else:
-            logger.info("Thread returned from API call, but it had timed out.")
+            logger.info("Thread returned from API call, "


note this is basically just for informational purposes for devs right now so this one line I no covered

redshiftzero · 2018-10-30T05:15:52Z

OK this seems to work fine in non-Qubes. In Qubes I was able to download a file but I'm still seeing a lot of timeouts when downloading files... @kushaldas or @joshuathayer if you get a chance to try this out in Qubes please do

ntoll

LGTM. Nice work..! (I read through the code and tested it runs fine against a local SD server and tests pass).

Question: do we want to wrap up all the threading related changes into this one handy PR (see #80)..?

redshiftzero · 2018-10-31T00:10:22Z

closing this in favor of #85

redshiftzero · 2018-10-31T00:10:38Z

(which also has these commits)

joshuathayer requested review from ntoll, redshiftzero and heartsucker October 26, 2018 23:47

joshuathayer commented Oct 26, 2018

View reviewed changes

ntoll mentioned this pull request Oct 29, 2018

Refactor API calls to be more robust #80

Closed

joshuathayer and others added 5 commits October 29, 2018 15:47

clears up threading, hopefully

09d47e0

Remove message-related stuff

479c171

Remove message-related things

0292c00

Remove duplicated line

e8db41c

Fix linting, add debug level for some logging, remove proxy flag

d23affe

redshiftzero force-pushed the cleanup-threading branch from 7d8f550 to d23affe Compare October 29, 2018 22:59

redshiftzero reviewed Oct 30, 2018

View reviewed changes

redshiftzero added 2 commits October 29, 2018 21:33

Add current_object logic to API refactor

2236fd4

API refactor: Update unit tests

5323e5f

redshiftzero force-pushed the cleanup-threading branch from 0944f97 to 5323e5f Compare October 30, 2018 04:33

ntoll approved these changes Oct 30, 2018

View reviewed changes

This was referenced Oct 30, 2018

API calls that could receive an "empty" object will incorrectly appear as an error` #71

Closed

Robustify threading #85

Merged

redshiftzero closed this Oct 31, 2018

heartsucker deleted the cleanup-threading branch May 15, 2019 12:50

conorsch mentioned this pull request Dec 13, 2023

[securedrop-proxy] Ensure proxy only starts inside sd-proxy VM #1695

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up threading #77

Clean up threading #77

joshuathayer commented Oct 26, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018 •

edited

Loading

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

redshiftzero Oct 27, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

joshuathayer Oct 26, 2018

ntoll commented Oct 29, 2018

redshiftzero commented Oct 30, 2018

redshiftzero Oct 30, 2018

redshiftzero commented Oct 30, 2018

ntoll left a comment

redshiftzero commented Oct 31, 2018

redshiftzero commented Oct 31, 2018

		self.api_thread.started.connect(self.api_runner.call_api)
		self.api_thread.finished.connect(self.call_reset)

Clean up threading #77

Clean up threading #77

Conversation

joshuathayer commented Oct 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshuathayer Oct 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntoll commented Oct 29, 2018

redshiftzero commented Oct 30, 2018

Choose a reason for hiding this comment

redshiftzero commented Oct 30, 2018

ntoll left a comment

Choose a reason for hiding this comment

redshiftzero commented Oct 31, 2018

redshiftzero commented Oct 31, 2018

joshuathayer Oct 26, 2018 •

edited

Loading