-
-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial fetch timeout (with working tests) #4193
Initial fetch timeout (with working tests) #4193
Conversation
61e2579
to
c83268b
Compare
(Deleted a long comment about being stalled on a RN thing, which we've moved past; see below.) |
And, as I just noted at zulip/react-native#5 (comment), we've found an easier, less invasive way! I'll push a new revision with that soon and finally un-mark this as a draft. |
48b3a71
to
c4137b8
Compare
OK! |
c4137b8
to
4ea6f61
Compare
I just fixed some conflicts and made a few commit-message tweaks. 🙂 |
4ea6f61
to
f13fee9
Compare
I just fixed some conflicts. 🙂 |
1fe3ed6
to
f6a2685
Compare
Just fixed a small conflict; small bump for a review (if still a P1 issue, at least). 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed up until the commits removing Lolex.
Thanks for working on this!
src/actionTypes.js
Outdated
* Notify Redux that we've given up on the initial fetch. | ||
* | ||
* Either because our timeout implementation says we've tried for long | ||
* enough, or because the server has responded with a 5xx error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit talks about 5xx errors, the previous commit about 4xx errors. Is that correct? Does this action get sent if there's a 4xx error, or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit talks about 5xx errors, the previous commit about 4xx errors. Is that correct?
I believe so, although I think I could be clearer. 🙂 Specifically, the role I have in mind for INITIAL_FETCH_ABORT
is not for cases where breaking from the retry loop is the only rational choice—rather, it's for when we just have a pretty good idea that retrying has stopped being useful, even if that isn't proven. Possibly renaming INITIAL_FETCH_ABORT
could help here; at least, I should expand the jsdoc.
On a 4xx error, breaking from the retry loop is the only rational choice, and I don't have this action getting dispatched on those. It's a client error; what we've been sending the server isn't any good, and we should bail on this retry loop. We should send a Sentry error report, if the server tells us we've given it garbage; or log out, if the auth is invalid; and so on. I suppose, if we were nimble enough as a client, there might be cases where we have some other reasonable input to try giving the server as a fallback—but I think even in those cases it's cleanest to throw out the current retry loop and start a new one, with fresh backoff state.
If we've just waited long enough (that we don't think we'll get an answer soon, or that we think the user wants to stop waiting, etc.), that's a suggestion that we should break out of the retry loop, but it's not a proof that waiting longer would be useless. Same with 5xx errors, I think—the server hasn't told us we've done anything wrong, and the issue might clear up in a second or two. But, realistically, what are the chances that it'll be cleared up soon anyway...might as well give up, even though we might have gotten lucky if we kept trying. 🤷
(I do see that I haven't actually started dispatching INITIAL_FETCH_ABORT
on 5xx errors in this revision, which is quite old…huh, perhaps I should do that! 🤔)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, seeing the ratio of the amount I've said just now, above, to what actually made it into comments/jsdoc, I think it could be helpful to have my assumptions examined, possibly on CZO; what do you think? 🙂
NavigationService.dispatch(resetToAccountPicker()); | ||
dispatch(initialFetchAbortPlain()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a common pattern? It seems strange to me, I'd expect the navigation to be driven by the receiver of the action, not the sender. But I don't know much about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect the navigation to be driven by the receiver of the action, not the sender.
Yeah; we used to have something like that.
Before #3804 was resolved, we would store React Navigation's state in our Redux store, and we had a navReducer
that we could make recognize an action like initialFetchAbortPlain()
and update the nav state accordingly.
(We actually mixed that approach with dispatching actions from the action creators in navActions
, with the same dispatch
we use for all the other actions. NavigationService
is a stepping stone toward using React Navigation's navigation
prop/object.)
With React Navigation's state not in our Redux store anymore, that old approach in navReducer
isn't available.
We also can't add a go-navigate-somewhere side effect to our existing reducers because reducers explicitly aren't the place for side effects.
The choice to put NavigationService.dispatch(resetToAccountPicker());
(or I guess, one day, navigation.reset({ index: 0, routes: [{ name: 'account-pick' }] });
or similar) in the thunk action creator initialFetchAbort
is a bit subtle: it follows our strategy to eliminate a class of bugs where we try to do stuff before the store has rehydrated; Greg has a good explanation of that here.
I've also thought it might be nice to not count on remembering to call go-navigate-somewhere code alongside dispatching things like logoutPlain()
. The fact that we do is a reminder that dispatching logoutPlain()
doesn't…finish the job of logging out: navigating away from the logged-in screens is an important part of that. 🙂
store.subscribe()
comes to mind, but that's a low-level API and it doesn't tell a passed handler about actions that have been dispatched; only the fact that an action has been dispatched.
I think that leaves Redux middleware—I could imagine writing some middleware that does things like see an INITIAL_FETCH_ABORT
-type action go by, and navigate based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also thought it might be nice to not count on remembering to call go-navigate-somewhere code alongside dispatching things like
logoutPlain()
. The fact that we do is a reminder that dispatchinglogoutPlain()
doesn't…finish the job of logging out: navigating away from the logged-in screens is an important part of that. 🙂
Ah, I just edited logout()
here to logoutPlain()
, after I saw that that's what I meant. I think the situation there isn't quite as bad as I'd thought; logout()
(the thunk action) does do the essential job of navigating, which makes it a bit more understandable that logoutPlain()
does not. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, in this PR, the thunk action initialFetchAbort()
does do the essential job of navigating, while the plain action initialFetchAbortPlain()
doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if I can unpack any of that, maybe on CZO. 🙂
src/utils/async.js
Outdated
/** | ||
* Time-out a Promise after `timeLimitMs` has passed. | ||
* | ||
* Returns a new Promise with the same outcome as `promise`, if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "outcome" here (and in the ¶ below) is a bit confusing to me — is this common Promise terminology? I'd find something like:
Returns a new Promise that resolves to whatever `promise` resolved to, if
`promise` completes in time.
If `promise` does not complete before `timeLimitMs` has passed, this function
calls `onTimeout` and returns a promise that resolves to its return value.
clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; I looked around for a more common term to use; the MDN doc on Promise
just uses "outcome" once, and I liked it when it did. Hmm.
I meant to write the interface without assuming promise
will resolve; instead, it might reject. See Greg's comment at #4166 (comment).
Maybe something like this:
* Returns a new Promise with the same outcome (resolved/rejected) as
* `promise`, if `promise` completes in time.
*
* If `promise` does not complete before `timeLimitMs` has passed,
* `onTimeout` is called, and its outcome is used as the outcome of
* the returned Promise.
src/message/fetchActions.js
Outdated
@@ -271,26 +270,46 @@ const fetchPrivateMessages = () => async (dispatch: Dispatch, getState: GetState | |||
* If the function is an API call and the response has HTTP status code 4xx | |||
* the error is considered unrecoverable and the exception is rethrown, to be | |||
* handled further up in the call stack. | |||
* | |||
* After a certain duration, times out with a TimeoutError. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/a certain duration/MAX_TIME_MS/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(EDIT: my point below is moot in the current revision)
I think the following point will be moot if we move MAX_TIME_MS
(with a rename) somewhere else, like in config.js
. 🙂 But it's a chance to bring up one of our habits so far.
In the code you're looking at, I intentionally left MAX_TIME_MS
out of the jsdoc. Our pattern with functions has been to use jsdoc for interface, and //
-style comments for implementation. While I haven't always gotten this right, I think I might have in this case (though with the big asterisk in my previous paragraph 😉): someone reading the jsdoc can't resolve MAX_TIME_MS
to a meaningful value without peeking under the hood and looking at its definition in the function's implementation. For that reason, I figured it didn't belong in the jsdoc.
I realized one thing quite a while after learning about that pattern: VSCode, at least with our config, parses a function's jsdoc and shows it to would-be callers on a hover interaction, etc. (I imagine other editors can be made to do that too.) When I remember this, it can help me decide what's interface vs. implementation: a good jsdoc will help callers do their business without even having to open the file that the function is defined in.
I think the right way forward, here, might be to define the constant in config.js
, in which case we can freely refer to it in the jsdoc because config.js
is a scope that callers will naturally be aware of. Does that sound right? And @gnprice, how's my interpretation of the jsdoc / //
pattern here; is it too rigid? 🙂
src/message/fetchActions.js
Outdated
await backoffMachine.wait(); | ||
} | ||
} | ||
// Without this, Flow 0.92.1 does not know this code is unreachable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we using Flow 0.128.0
now? Does that have this same bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks like it; I'll update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, and this has disappeared with the change to promiseTimeout
's interface described at #4193 (comment).
🎉
src/message/fetchActions.js
Outdated
*/ | ||
export async function tryFetch<T>(func: () => Promise<T>): Promise<T> { | ||
const MAX_TIME_MS: number = 60000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minute seems quite long for this timeout. How'd you decide on that?
(It could be reasonable, certainly it's quite bad if we get into a failure loop because we picked too optimistic a value, I'm just curious how you arrived here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That came from the description of #4165; it's not super precise:
As a minimal version, if the initial fetch takes a completely unreasonable amount of time (maybe one minute), we should give up and take you to the account-picker screen so you can try a different account and server.
Still, good idea to say something in a code comment—possibly this actually belongs in config.js
?
Thanks for the review, @WesleyAC! I've left some responses on each of your comments. Looking again, I wonder if we really want to model |
I've got an answer there and am ready to continue working on this. 🙂 |
So it's easier to compare the inputs we're testing. 10ms is a realistic amount of time for an API request to take. It's also less than 100ms, which had been used for one of these. So, our tests will run a bit faster (and they'll run a lot faster when we switch over to fake timers soon).
And make the tests more rigorous while we're at it. When we add a timeout to `tryFetch`, we'll want to use fake timers so that the tests don't try to do inconvenient things like waiting a whole minute for something to happen.
The fact that this test passes isn't good -- it means a basically arbitrary, unexpected kind of error will let the retry loop continue without propagating to `tryFetch`'s caller. We'll fix that logic soon, and add a test case with an error like that. But for now, test that we get the right behavior with representative inputs.
We'll use this in `tryFetch`, in an upcoming commit, so that we can interrupt in-progress attempts to contact a server when they take too long. See discussion [1]. [1]: https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/Stuck.20on.20loading.20screen/near/907693
To be dispatched when it doesn't seem like we'll get a response from the server, or when the server responds with a 5xx error. It navigates to the 'accounts' screen so a user can try a different account and server. Logging out wouldn't be good; the credentials may be perfectly fine, and we'd like to keep them around to try again later. It sets `needsInitialFetch` to `false` [1], just like `INITIAL_FETCH_COMPLETE`, while retaining a different meaning than that action (i.e., that the fetch was aborted instead of completed). Setting `needsInitialFetch` to false is necessary to ensure that a subsequent initial fetch can be triggered when we want it to be. As also noted in 7caa4d0, `needsInitialFetch` is "edge-triggered". (That edge-triggering logic seems complex and fragile, and it would be nice to fix that.) See also discussion [1]. [1]: https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/Stuck.20on.20loading.20screen/near/907591
Change the condition for exiting the retry loop from `isClientError(e)` to `!isServerError(e)`.
So far, `tryFetch`'s only callers are in the initial fetch; so, add handling for the `TimeoutError` there. The choice of value for `requestLongTimeoutMs` comes from a suggestion in zulip#4165's description: > As a minimal version, if the initial fetch takes a completely > unreasonable amount of time (maybe one minute), we should give up > and take you to the account-picker screen so you can try a > different account and server. Fixes: zulip#4165
As Greg points out [1], this makes the most sense conceptually; it's happening at the bottom of the loop, just before a new iteration starts. The `return` in the `try` block is enough to ensure that this wait won't interfere with a successful fetch. [1]: zulip#4166 (comment)
Greg points out that the initial fetch isn't actually a place where we want to retry on 5xx errors [1]: > Ah, I think in #M4165 the point is that if the server isn't > responding, we want to give you the option to go choose some other > account. The context there is that we're in the initial fetch, so > showing the loading screen, and as long as we're doing that > there's no other UI. > So yeah, I think basically we don't want to do any retrying here. > Instead we can kick you to the account-picker screen, with a toast > or something to indicate an error, and then you might manually > retry a time or two or you might bail and switch to some other > account. > And in particular if you didn't even want to be using that account > anymore -- maybe you even know that it's a server which is > permanently shut down, but it just happened to be the last one > you'd been using in the app and so it's the one we tried loading > data from on startup -- then you can go use whatever other account > you were actually opening the app to use. This does mean that `tryFetch` now has no callsites. It's a useful function, so we'll add a useful callsite soon so the function still gets exercised and maintained as appropriate. [1] https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/Stuck.20on.20loading.20screen/near/1178689
A `TimeoutError` will be handled the same way other errors in `fetchMessages` are handled; if it's a timeout in the fetch `ChatScreen` does on mount, `ChatScreen` will show the `FetchError` component we set up in zulip#4205. There's also been a passing mention on CZO of doing a timeout like this [1]: > After a long time, probably like a minute, we'll want that [...] > fetch to time out and fail in any case. [1] https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/.23M4156.20Message.20List.20placeholders/near/950853
And add that message and the existing message to messages_en.json; looks like we forgot to add the existing one.
Use Jest's "modern" fake timers instead of our Lolex wrapper. Also, remove one `describe` block for tests that examine an edge-case safety feature that we built into our Lolex wrapper, but that doesn't seem to exist in Jest. Ah, well.
Use Jest's "modern" fake timers instead of our Lolex wrapper.
Use Jest's "modern" fake timers instead of our Lolex wrapper.
Use Jest's "modern" fake timers instead of our Lolex wrapper.
Use Jest's "modern" fake timers instead of our Lolex wrapper.
We've entirely switched over to Jest's "modern" fake timers, which landed in jestjs/jest#7776.
Also, remove several now-unnecessary calls of `jest.useFakeTimers('modern')`, but keep a few assertions that the "modern" timers are actually being used. In particular, our `jestSetup` is a central place where we make the assertion. Not only is it good to check that we still intentionally set the "modern" implementation, but we want to make sure that the setting is correctly applied. See the note in fb23341 about it being silently not applied until we added @jest/source-map as a direct dependency. We have an ESLint rule, from 2faad06, preventing imports from '**/__tests__/**'; the rule is active in all files not matching that same pattern. Add an additional override so that we can make the "modern"-timers assertion from within `jest/jestSetup.js`.
Follow and delete a code comment at the top of `backoffMachine-test`, suggesting that we move these tests.
f6a2685
to
8c20c8b
Compare
Revision pushed! This one keeps having And this revision starts using There are a lot of commits in this revision, so if you spot an opportunity to merge the first n commits, or something, feel free to do that, to help focus the review. I'm also happy to split this into multiple PRs. 🙂 |
This is the "minimal version" of a fix Greg describes in the issue; as he mentions there, it would be good to make follow-up improvements.
Fixes: #4165
EDIT: The below hack is no longer necessary, and this branch doesn't use it (Greg and I worked it out on the phone).
EDIT, again: Deleted the description of that hack in case it was deterring a review. 🙂