Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up threading #77

Closed
wants to merge 7 commits into from
Closed

Clean up threading #77

wants to merge 7 commits into from

Conversation

joshuathayer
Copy link
Contributor

This fixes our API request timeout handling, and in the process tries to clean up how we handle our API thread in general.

As a quick overview: in order to retain application interactivity while we do remote API calls, we run those API calls in a separate thread (as provided by Qt's QThread) from our main UI thread. That's a common and recommended pattern.

Specifically, we have a logic.Client class, which is responsible for spawning and managing an API thread. Client holds a ref to an client.APICallRunner object, which represents the remote API request. When it's time to run the request, the Client spawns a new thread, "moves" its APICallRunner object to the new thread, and starts the new thread. The APICallRunner's call_api method does the actual remote request, and its thread is blocked while that request runs.

The Client is responsible for managing that thread: it needs to have methods to handle successes and failures, including timing out.

The first issue this PR addresses was that we were started a QTimer in the APICallRunner thread, setting its timeout handler to a function in that thread. But since that thread is blocked while the remote request is running, it has no way of handling that callback until the remote request returns. This led to a bug where both the success and timeout handlers were being run: the request would take longer than 5 seconds and then eventually succeed. In the meantime, the timer would arrange for the timeout handler to be run on the API thread. When the remote request returned and the thread became unblocked, the success handler would be run (since the request succeeded), and also the timeout handler would run (since the timer arranged for that to happen).

I moved the timer and its handler into the Client object and its thread. When the timer runs out, we abandon it and its APICallRunner object (from the perspective of Client), but the thread is still running and will happily call its success callback when it eventually gets unblocked. So, we set a flag in the CallRunner object telling it that it's run out of time, and should silently do nothing when it eventually becomes unblocked.

Next, this tidies up our callbacks a bit, by handling Client-object state changes after request/failure in Client-wide functions, rather than in individual callbacks. After every request, we want to reset the client to an "empty" state: no APICallRunner object, no API thread, no callbacks registered. There are slightly different ways to do this on success and on failure. Rather than duplicating that logic in every callback handler, this PR now wraps the user-specific callback in a small function which handles the cleanup on its own.

Note that this involves a small change to how callbacks get to any response data. Previously, they'd look at self.api_runner.result. In this PR, we reset self.api_runner before the callback is run, so self.api_runner.result will never be defined while in the callback. Rather, the result is passed as an argument to the callback function directly, so callback signatures should be of the form

def foo_callback(self, result, result_data):

where result is a boolean, and result_data is the actual data returned from the API (previously found at self.api_runner.result)

This change allows us to greatly simplify how the client is reset between requests, and reduces the responsibilities of the success/failure callbacks (or, if you've consumed the same koolaide as me, it decomplects those responsibilities).

Multitheaded stuff is notoriously hard to test and to reason about. I only found these problems because I'm developing against a remote SD instance and was running in to timeout issues. I've manually tested various success and failure cases, but I suspect that there are still race conditions and cases that cause unexpected behavior... though I do think this puts us in a better state than before?

Also this work originated as part of my branch adding message loading. I manually pulled out this threading stuff, but there are some changes remaining here which have to do with message loading still... I'll to clean those up shortly.

@@ -47,33 +47,29 @@ def __init__(self, api_call, *args, **kwargs):
self.args = args
self.kwargs = kwargs
self.result = None
self.i_timed_out = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag telling the API thread that it's been timed out, and when it finally gets unblocked it should Do Nothing.

self.timer = QTimer()
self.timer.timeout.connect(lambda: self.timeout.emit())
self.timer.setSingleShot(True)
self.timer.start(5000)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the timer stuff moved to the Client object.

self.sync_flag = os.path.join(home, 'sync_flag')
self.home = home # The "home" directory for client files.
self.data_dir = os.path.join(self.home, 'data') # File data.
self.timer = None # call timeout timer
Copy link
Contributor Author

@joshuathayer joshuathayer Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Client object now has a timer, to detect API calls which take too long. The home stuff above is unrelated to the threading changes.


# we've started a timer. when that hits zero, call our
# timeout function
self.timeout_api_call.connect(lambda: self.timeout_cleanup(timeout))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than triggering the user-specified callbacks (for success and timeouts) directly, we wrap them in our own functions which will clean up the client, then call the callback functions.

self.api_thread.start()

else:
logger.info("There's already an API request running, so I'm not going to ignore this one (XXX this may not be the coolest thing to do...)")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the log message implies, we should consider how we actually want to handle "concurrent" API requests...

self.api_runner = None
self.api_thread = None
self.timer = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call_reset is called by our callback wrappers. We take care to disconnect the timeout handler callbacks: if we don't, then we'll just add more and more callbacks over time, and when one call finally times out, all collected timeout handlers will get triggered (even for requests that have long-since completed).

self.api_thread = QThread(self.gui)
self.api_runner = APICallRunner(function, *args, **kwargs)
self.api_runner.moveToThread(self.api_thread)
self.api_runner.call_finished.connect(callback)
self.api_runner.timeout.connect(timeout)
self.finish_api_call.connect(self.api_runner.on_cancel_timeout)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled in our callback wrapper now.

self.api_thread.started.connect(self.api_runner.call_api)
self.api_thread.finished.connect(self.call_reset)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also handled in our callback wrapper.


def login(self, username, password, totp):
"""
Given a username, password and time based one-time-passcode (TOTP),
create a new instance representing the SecureDrop api and authenticate.
"""
self.api = sdclientapi.API(self.hostname, username, password, totp)

self.api = sdclientapi.API(self.hostname, username, password, totp, proxy=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this proxy=True will probably break things for folks who are not running on Qubes, I think... Hmm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can snip this out for review on non-Qubes OSes (cc #2)

"""
Handles the result of an authentication call against the API.
"""
self.call_reset()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now handled in our callback wrapper

"""
self.timer.stop()

def on_authenticate(self, result, result_data):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note the new call signature, adding result_data (unused in this method, as it turns out)

self.call_reset()

user_callback()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are our callback wrappers, which reset the state of the Client before calling user callbacks.

self.on_sync_timeout, self.api)
logger.info("In sync_api, after call to call_api, I'm in thread {}".format(self.thread().currentThreadId()))
else:
logger.info("It looks like you are not authenticated. Weird, isn't it.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was debugging output and should be removed

Called when downloading a file has timed out.
"""
# Update the status bar to indicate a failure state.
self.set_status("Connection to server timed out, please try again.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leftover from my "message" branch and should be removed from this PR.

@ntoll
Copy link
Contributor

ntoll commented Oct 29, 2018

@joshuathayer epic stuff and a much needed refactor.

Regarding QTimer... my understanding was that it was something that couldn't be blocked (i.e. Qt handled this automagically when the QTimer object was started). But I see from the docs, that we'd need to add an event loop to the thread (http://doc.qt.io/qt-5/qtimer.html#details), or do something else (as you've done) so good catch there!

Everything else you've done makes complete sense to me and I like how we're simplifying (or should that be decomplectifying, ;-) ?) . In any case, this completely resonates with the "just build something that works, then refactor when we know more" philosophy of my first pass at this. If we can wrap all the thread-y stuff in this way, it'll make any future API calls much easier.

Also, see my comments in #80 about the signal not indicating success/fail via a bool so we fix #71. Could this be wrapped into this PR..? Happy to do this work if need be.

@redshiftzero
Copy link
Contributor

I started making minor changes to this (tests/linting fixes) so I could test and merge... but then I just remembered since this PR was proposed we added the use of current_object to keep track of which object is relevant in some (not all) callbacks for the file download logic. so I'm going to start making that update and add here since we need it here also to not break master (I think #71 we should do in a followup since we don't strictly need it -- smaller PRs are best PRs)

@@ -67,7 +68,8 @@ def call_api(self):
if self.i_timed_out is False:
self.call_finished.emit(result_flag)
else:
logger.info("Thread returned from API call, but it had timed out.")
logger.info("Thread returned from API call, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note this is basically just for informational purposes for devs right now so this one line I no covered

@redshiftzero
Copy link
Contributor

OK this seems to work fine in non-Qubes. In Qubes I was able to download a file but I'm still seeing a lot of timeouts when downloading files... @kushaldas or @joshuathayer if you get a chance to try this out in Qubes please do

Copy link
Contributor

@ntoll ntoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work..! (I read through the code and tested it runs fine against a local SD server and tests pass).

Question: do we want to wrap up all the threading related changes into this one handy PR (see #80)..?

@redshiftzero
Copy link
Contributor

closing this in favor of #85

@redshiftzero
Copy link
Contributor

(which also has these commits)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants