Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: Handle nested benign exception #10909

Merged
merged 1 commit into from
Jun 2, 2023

Conversation

abhijat
Copy link
Contributor

@abhijat abhijat commented May 22, 2023

If a nested exception is seen during an HTTP call where either the inner or outer exception is abort or gate closed, then the exception is logged on debug level, to be consistent with how these exception types (abort/gate closed) are handled in isolation.

If both the components of the nested exception are of any other kind, an error is logged.

Fixes #10706

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

@abhijat
Copy link
Contributor Author

abhijat commented May 22, 2023

ci failure is #10497

@abhijat abhijat requested review from VladLazar and andijcr May 22, 2023 15:47
@andijcr
Copy link
Contributor

andijcr commented May 22, 2023

I understand that the original failure was seastar::abort_requested_exception throw while handling seastar::abort_requested_exception, is gate_closed_exception expected to happen in the same way, or is it just future proofing?

If a nested exception is thrown where either the inner or outer
exception is abort requested or gate closed, this is logged as a debug
level error.

If the nested exception is of any other kind, an error level log is
generated.
@abhijat abhijat force-pushed the handle-nested-exception-logging branch from 6e20786 to 69b4c69 Compare May 22, 2023 16:10
@abhijat
Copy link
Contributor Author

abhijat commented May 22, 2023

I understand that the original failure was seastar::abort_requested_exception throw while handling seastar::abort_requested_exception, is gate_closed_exception expected to happen in the same way, or is it just future proofing?

Mostly future proofing, yes. I assumed a nested gate closed can happen in the same way as a nested abort can happen and will need to be handled the same way.

@abhijat abhijat requested a review from andijcr May 22, 2023 16:12
@@ -67,6 +88,14 @@ error_outcome handle_client_transport_error(
} catch (const ss::abort_requested_exception&) {
vlog(logger.debug, "Abort requested");
throw;
} catch (const ss::nested_exception& ex) {
if (has_abort_or_gate_close_exception(ex)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, do you know where this was being thrown? Just want to make sure we need to worry about unwrapping the other exception types

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also curious about this. is this something we all need to be on the look out for?

Copy link
Contributor Author

@abhijat abhijat May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific instance was during a put object request handled by the s3 client:

WARN  2023-05-11 22:30:33,807 [shard 0] s3 - s3_client.cc:681 - S3 request failed with error: seastar::nested_exception: seastar::abort_requested_exception (abort requested) (while cleaning up after seastar::abort_requested_exception (abort requested))
ERROR 2023-05-11 22:30:33,807 [shard 0] s3 - util.cc:71 - Unexpected error seastar::nested_exception: seastar::abort_requested_exception (abort requested) (while cleaning up after seastar::abort_requested_exception (abort requested))

In the same time frame there were several other put requests which seem to have failed but the error logging correctly logged them as DEBUG but the nested instance got logged as error:

[abhijat@fedora Downloads]$ grep -E 'util.*(Abort r|abort_r)' r.log
DEBUG 2023-05-11 22:30:33,796 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,796 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,801 [shard 0] s3 - util.cc:68 - Abort requested
ERROR 2023-05-11 22:30:33,807 [shard 0] s3 - util.cc:71 - Unexpected error seastar::nested_exception: seastar::abort_requested_exception (abort requested) (while cleaning up after seastar::abort_requested_exception (abort requested))
DEBUG 2023-05-11 22:30:33,812 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,817 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,825 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,825 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,825 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,832 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,838 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,838 [shard 0] s3 - util.cc:68 - Abort requested
DEBUG 2023-05-11 22:30:33,842 [shard 0] s3 - util.cc:68 - Abort requested

AFAIK the issue here is how we classify and log exceptions in handle_client_transport_error so it should be enough to handle this specifically in this function?

Just want to make sure we need to worry about unwrapping the other exception types

Could you expand on this? Do you mean we should be handling the nested types differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure we need to worry about unwrapping the other exception types

Could you expand on this? Do you mean we should be handling the nested types differently?

@abhijat thanks for the response. to clarify my question: might we have other places where a nested exception is being thrown but not handled correctly by unwrapping its inner/outer components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dotnwat initially as mentioned in my comment here I assumed that we only need to handle this in the cloud storage client where we classify log levels, but now thinking more about it, anywhere we handle abort requested or gate closed exceptions differently from general exceptions, we may also need to examine nested exceptions to see if they should be classified as abort requested/gate closed.

This pattern is in a few places in the tree (for example, searching for handle_exception_type([](const ss::abort_requested_exception yields a few results) so I suspect there are more places where we should be unwrapping nested exceptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@abhijat
Copy link
Contributor Author

abhijat commented Jun 2, 2023

/ci-repeat

@abhijat
Copy link
Contributor Author

abhijat commented Jun 2, 2023

CI failures: #10868 and #11151

@piyushredpanda
Copy link
Contributor

/backport v23.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants