Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI: Log streaming bug fix medley #7820

Merged
merged 5 commits into from
Apr 30, 2020
Merged

UI: Log streaming bug fix medley #7820

merged 5 commits into from
Apr 30, 2020

Conversation

DingoEatingFuzz
Copy link
Contributor

This fixes a couple bugs @Gurpartap reported in #7770 as well as a couple more I found along the way.

Bug 1: Switching from stdout to stderr before the stdout request fails or times out causes an error on the stderr log stream despite successfully streaming the logs.

This was a fun race condition. Logs first attempt to connect to the client an allocation is running on before falling back on the server which should always work. When the client request fails, a flag is set to failover to the server. When that server failover flag is already set and another request fails, connecting to logs is deemed a total loss and the "Cannot fetch logs" error is shown.

If a user clicks stderr before stdout triggers the first failure flag then the stderr request will attempt to connect to the client. This causes both stdout and stderr to fail if the client can't be reached. The first failure flips the fallback flag and the second failure flips the total loss flag. However, when the fallback flag is flipped, it still causes a server request to be made, which is why log streaming continued to work.

The fix here was to capture the failover flag state at request time rather than response time. This way both racing client requests flip the same flag (or rather the first one flips it and the second is a no-op).

Bug 2: The "Cannot fetch log" error persists when switching between stdout/stderr

This one technically isn't a bug. If the error message is shown, it means no connection can be established. It only seems like a bug when Bug 1 occurs. However, I too dislike error messages that can't be dismissed, so I made it dismissable anyway.

Bug 3: Failing over to a server request after a client request takes longer than the client timeout threshold but still eventually responds causes two parallel stream requests, one which never closes.

This one was always known about but AbortControllers where still a relatively new thing when this was first written.

The task logger wisely has its own timeout for client requests to prevent a hanging request from triggering the fallback to making a server request. 9/10 (maybe even 99/100?) times this request will eventually timeout because the client can't be reached, but in the event the client does respond, that connection would remain open indefinitely eating up one of the precious few parallel connections a browser gets to a single host.

The fix here is to use an AbortController as mentioned. They now have broad support. The only browser we are concerned with that does not have them is IE11 which is ok in this instance since IE11 also doesn't support response streaming so it will use the poll logger anyway.

Bug 4: Clicking stdout/stderr when already on stdout/stderr stops the log streamer.

This one wasn't tricky by any stretch of the imagination. Just a small detail I overlooked. The fix was to return early if the mode being switched to was already the current mode.

Fixes #7770

@DingoEatingFuzz DingoEatingFuzz requested review from backspace and a team April 28, 2020 15:53
Copy link
Contributor

@backspace backspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice collection of concise fixes for some difficult problems; I have a couple of comments on tests but nothing major.

@@ -255,5 +279,54 @@ module('Integration | Component | task log', function(hooks) {
'Log request was later made to the server'
);
assert.ok(find('[data-test-connection-error]'), 'An error message is shown');

await click('[data-test-connection-error-dismiss]');
// await settled();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this could be removed?


run.later(() => {
click('[data-test-log-action="stdout"]');
run.later(run, run.cancelTimers, commonProps.interval * 6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well use just interval here too?

click('[data-test-log-action="stderr"]');
run.later(run, run.cancelTimers, commonProps.interval * 5);
}, allowedConnectionTime / 2);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test setup!

'Server request for stderr'
);

assert.notOk(find('[data-test-connection-error]'), 'An error message is shown');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation should include “not” I think, but maybe it’s not necessary at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You caught me copy/pasting. The description probably isn't necessary, but I was following the convention of the file.

…onably

Now options can be provided without also having to remember to pass
credentials. This is convenient for abort controller signals.
Typically a failover means that the client can't be reached. However, if
the client does eventually return after the timeout period, the log will
stream indefinitely. This fixes that using an API that wasn't broadly
available at the time this was first written.
…nnection error

This would happen because a no connection error happens after the second request fails, but
that's because it's assumed the second request is to a server node. However, if a user clicks
stderr fast enough, the first and second requests are both to the client node. This changes
the logic to check if the request is to the server before deeming log streaming a total failure.
@DingoEatingFuzz DingoEatingFuzz merged commit 25a74ec into master Apr 30, 2020
@DingoEatingFuzz DingoEatingFuzz deleted the b-ui/ui-log-races branch April 30, 2020 01:06
@pySilver
Copy link

pySilver commented Jun 4, 2020

Quick question – it this change is available in 0.11.2 ? Somehow I've started having "Cannot fetch logs" all over again after upgrading from 0.11.1 :(

@DingoEatingFuzz
Copy link
Contributor Author

Yes, this is in 0.11.2.

Since you're still seeing log funkiness, can you open a new issue with more details?

@github-actions
Copy link

github-actions bot commented Jan 3, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ui logs: quickly switching from stdout to stderr produces error
3 participants