Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow coroutines to fail without destabilizing the app #1741

Open
BenHenning opened this issue Aug 28, 2020 · 6 comments
Open

Allow coroutines to fail without destabilizing the app #1741

BenHenning opened this issue Aug 28, 2020 · 6 comments
Labels
bug End user-perceivable behaviors which are not desirable. Impact: Low Low perceived user impact (e.g. edge cases). Issue: Needs Clarification Indicates that an issue needs more detail in order to be able to be acted upon. Work: High It's not clear what the solution is. Z-ibt Temporary label for Ben to keep track of issues he's triaged.

Comments

@BenHenning
Copy link
Member

BenHenning commented Aug 28, 2020

Is your feature request related to a problem? Please describe.
When coroutines fail, they trigger their outer coroutine scope to enter a failure state. This is unexpected: it results in background failures causing all subsequent background tasks to stay failing preventing the app from working correctly anymore.

Describe the solution you'd like
We should:

  • Log & track exceptions in background tasks
  • Propagate that failure to the UI so that it can respond or show something to the user
  • Continue background execution by leveraging a SupervisorJob

For the most part, background execution happens via DataProviders, so this solution probably just needs to be built into NotifiableAsyncLiveData. We actually do want failures of children coroutines to trigger an outright failure of the job (including across transformed or combined data providers), but we don't want an independent background DataProvider operation to trigger a failure in an unrelated DataProvider.

Describe alternatives you've considered
SupervisorJob seems to be the mechanism invented for this purpose. Two alternatives might be:

  • Introducing a general purpose task system (effectively reinventing SupervisorJob or coroutines)
  • Not use coroutines (similar to above)

Both of these approaches seem harder & more involved than leveraging SupervisorJob.

Additional context
Lots of reading material on the issue:

@BenHenning BenHenning added Type: Improvement Priority: Essential This work item must be completed for its milestone. labels Aug 28, 2020
@BenHenning BenHenning added this to the Beta milestone Aug 28, 2020
@BenHenning
Copy link
Member Author

NB: I suspect this is the reason we were seeing the app stop working when priming was enabled and a failure was encountered. We probably have seen this in a few places without realizing what the underlying cause was. This fix should significantly improve the stability of the app.

@BenHenning
Copy link
Member Author

Also, per SupervisorJob documentation a child failure does not affect other children, so some thought needs to be put into how to handle failures in a chained DataProvider case (or even a standard DataProvider suspend function calling other suspend functions, triggering the creation of child coroutines).

@BenHenning
Copy link
Member Author

Also: the AsyncResult part of DataProviders may be partly hiding this by aggressively catching exceptions to propagate exceptions.

@BenHenning
Copy link
Member Author

For context, I discovered this when trying to implement a CoroutineExecutorService for interop with Java services that needs to rely on ExecutorServices for async operations and that we want to coordinate with our test coroutine dispatchers (e.g. Glide).

@BenHenning
Copy link
Member Author

BenHenning commented Aug 28, 2020

https://kotlinlang.org/docs/reference/coroutines/exception-handling.html#supervision-scope may be a better way to go in general since it lets us hook into async in cases when we want Deferred, and avoids needing to manually implement a cancellation policy. This also implements closer to we want: all children are cancelled if one fails, but the parent stays unaffected.

@BenHenning
Copy link
Member Author

Actually, it occurs to me all we might need to do is ensure there are different scopes for each independent thing that we want to execute. I think that better fits the paradigm for structured concurrency.

@Broppia Broppia added issue_type_infrastructure Impact: Low Low perceived user impact (e.g. edge cases). labels Jul 29, 2022
@BenHenning BenHenning added Issue: Needs Clarification Indicates that an issue needs more detail in order to be able to be acted upon. Z-ibt Temporary label for Ben to keep track of issues he's triaged. labels Sep 16, 2022
@BenHenning BenHenning removed this from the Beta milestone Sep 16, 2022
@seanlip seanlip added bug End user-perceivable behaviors which are not desirable. and removed issue_type_infrastructure labels Mar 28, 2023
@MohitGupta121 MohitGupta121 added the Work: High It's not clear what the solution is. label Jun 16, 2023
@seanlip seanlip removed the Priority: Essential This work item must be completed for its milestone. label Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug End user-perceivable behaviors which are not desirable. Impact: Low Low perceived user impact (e.g. edge cases). Issue: Needs Clarification Indicates that an issue needs more detail in order to be able to be acted upon. Work: High It's not clear what the solution is. Z-ibt Temporary label for Ben to keep track of issues he's triaged.
Development

No branches or pull requests

5 participants