Allow coroutines to fail without destabilizing the app #1741

BenHenning · 2020-08-28T23:16:36Z

Is your feature request related to a problem? Please describe.
When coroutines fail, they trigger their outer coroutine scope to enter a failure state. This is unexpected: it results in background failures causing all subsequent background tasks to stay failing preventing the app from working correctly anymore.

Describe the solution you'd like
We should:

Log & track exceptions in background tasks
Propagate that failure to the UI so that it can respond or show something to the user
Continue background execution by leveraging a SupervisorJob

For the most part, background execution happens via DataProviders, so this solution probably just needs to be built into NotifiableAsyncLiveData. We actually do want failures of children coroutines to trigger an outright failure of the job (including across transformed or combined data providers), but we don't want an independent background DataProvider operation to trigger a failure in an unrelated DataProvider.

Describe alternatives you've considered
SupervisorJob seems to be the mechanism invented for this purpose. Two alternatives might be:

Introducing a general purpose task system (effectively reinventing SupervisorJob or coroutines)
Not use coroutines (similar to above)

Both of these approaches seem harder & more involved than leveraging SupervisorJob.

Additional context
Lots of reading material on the issue:

The text was updated successfully, but these errors were encountered:

BenHenning · 2020-08-28T23:17:21Z

NB: I suspect this is the reason we were seeing the app stop working when priming was enabled and a failure was encountered. We probably have seen this in a few places without realizing what the underlying cause was. This fix should significantly improve the stability of the app.

BenHenning · 2020-08-28T23:22:42Z

Also, per SupervisorJob documentation a child failure does not affect other children, so some thought needs to be put into how to handle failures in a chained DataProvider case (or even a standard DataProvider suspend function calling other suspend functions, triggering the creation of child coroutines).

BenHenning · 2020-08-28T23:25:13Z

Also: the AsyncResult part of DataProviders may be partly hiding this by aggressively catching exceptions to propagate exceptions.

BenHenning · 2020-08-28T23:27:02Z

For context, I discovered this when trying to implement a CoroutineExecutorService for interop with Java services that needs to rely on ExecutorServices for async operations and that we want to coordinate with our test coroutine dispatchers (e.g. Glide).

BenHenning · 2020-08-28T23:28:57Z

https://kotlinlang.org/docs/reference/coroutines/exception-handling.html#supervision-scope may be a better way to go in general since it lets us hook into async in cases when we want Deferred, and avoids needing to manually implement a cancellation policy. This also implements closer to we want: all children are cancelled if one fails, but the parent stays unaffected.

BenHenning · 2020-08-29T00:46:54Z

Actually, it occurs to me all we might need to do is ensure there are different scopes for each independent thing that we want to execute. I think that better fits the paradigm for structured concurrency.

BenHenning added Type: Improvement Priority: Essential This work item must be completed for its milestone. labels Aug 28, 2020

BenHenning added this to the Beta milestone Aug 28, 2020

BenHenning mentioned this issue Sep 2, 2020

Fix part of #973: Introduce coroutine executor service #1764

Merged

MaskedCarrot assigned MaskedCarrot and unassigned MaskedCarrot Jan 28, 2021

BenHenning added temp: triage for beta and removed Status: Not started labels Jun 7, 2022

Broppia added issue_type_infrastructure Impact: Low Low perceived user impact (e.g. edge cases). labels Jul 29, 2022

BenHenning added the issue_user_developer label Sep 16, 2022

BenHenning moved this to Needs Triage in [Team] Developer Workflow & Infrastructure - Android Sep 16, 2022

BenHenning added this to [Team] Developer Workflow & Infrastructure - Android Sep 16, 2022

BenHenning added Issue: Needs Clarification Indicates that an issue needs more detail in order to be able to be acted upon. Z-ibt Temporary label for Ben to keep track of issues he's triaged. labels Sep 16, 2022

BenHenning removed this from the Beta milestone Sep 16, 2022

seanlip added bug End user-perceivable behaviors which are not desirable. and removed issue_type_infrastructure labels Mar 28, 2023

seanlip removed issue_user_developer labels Mar 29, 2023

seanlip moved this to Todo in [Team] Developer Workflow & Infrastructure - Android Jun 4, 2023

MohitGupta121 added the Work: High It's not clear what the solution is. label Jun 16, 2023

seanlip removed the Type: Improvement label Jun 16, 2023

seanlip removed the Priority: Essential This work item must be completed for its milestone. label Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow coroutines to fail without destabilizing the app #1741

Allow coroutines to fail without destabilizing the app #1741

BenHenning commented Aug 28, 2020 •

edited

Loading

BenHenning commented Aug 28, 2020

BenHenning commented Aug 28, 2020

BenHenning commented Aug 28, 2020

BenHenning commented Aug 28, 2020

BenHenning commented Aug 28, 2020 •

edited

Loading

BenHenning commented Aug 29, 2020

Allow coroutines to fail without destabilizing the app #1741

Allow coroutines to fail without destabilizing the app #1741

Comments

BenHenning commented Aug 28, 2020 • edited Loading

BenHenning commented Aug 28, 2020

BenHenning commented Aug 28, 2020

BenHenning commented Aug 28, 2020

BenHenning commented Aug 28, 2020

BenHenning commented Aug 28, 2020 • edited Loading

BenHenning commented Aug 29, 2020

BenHenning commented Aug 28, 2020 •

edited

Loading

BenHenning commented Aug 28, 2020 •

edited

Loading