Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EH receiver warnings #15077

Closed
DuneeshaFernando opened this issue Nov 6, 2020 · 7 comments · Fixed by #19171
Closed

EH receiver warnings #15077

DuneeshaFernando opened this issue Nov 6, 2020 · 7 comments · Fixed by #19171
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs Messaging Messaging crew needs-author-feedback Workflow: More information is needed from author to address the issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus uAMQP
Milestone

Comments

@DuneeshaFernando
Copy link

Is your feature request related to a problem? Please describe.
I'm using azure-sdk-for-python to receive events from EventHub. However, sometimes I notice that multiple warnings are being printed as logs continuously. Following are some of the types of logs being printed.

  1. ConnectionClose('ErrorCodes.UnknownError: Connection in an unexpected error state.')
  2. LinkDetach('ErrorCodes.LinkDetachForced: The link 'G29:73011:receiver-link-9b2943e1-75f2-4516-b262-157c0d77b397' is force detached. Code: ServerError. Details: InnerMessageReceiver was faulted. TrackingId:d1c56cee0000007f000244695f93a463_G29_B10, SystemTracker:*****:eventhub:*****~21503|$default, Timestamp:2020-10-28T04:50:45')
  3. LinkDetach('ErrorCodes.ConnectionCloseForced: The connection was inactive for more than the allowed 240000 milliseconds and is closed by container 'e95e6aa54ddf46f7a91cda36f927e262_G18'.')

Since, they do not seem to actually interrupt my applications, could you tell me how to get rid of those or if there is any mistake in my code, could I know how to correct it?

Describe the solution you'd like
If it could be confirmed that there is no problem in receiving events from EH, I suggest preventing printing those logs. The main reason for my suggestion is because sometimes it is difficult to track actual logs from my application due to the high frequency of the EH receiver warnings being printed.

Describe alternatives you've considered
N/A

Additional context
Please request for any details required.

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Nov 6, 2020
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Nov 6, 2020
@nadundesilva
Copy link

I am just curious whether we can't use a proper logger in the SDK with proper log levels. Then we can easily turn certain logs on and off.

Also as mentioned since these are not actual errors (the SDK can recover from them), they should be either warning or debug logs I think.

@KieranBrantnerMagee
Copy link
Member

Hey folks;
From what you're describing, I'm inclined to think this is a facet of a known issue documented here. These are raised from the UAMQP layer, and for legacy reasons were not well connected with the idiomatic logging levels as you suggest, which is something we fully intend to fix per the linked issue.

(If anyone is curious, the context is roughly "according to AMQP spec, these sort of failures are not normal and were above their threshold for "be noisy about it regardless", but in a servicebus/eventhub context they explicitly have a meaning and should be controlled ideally as you describe.)

To the point, your code is potentially not doing anything wrong if you're able to send and receive and continue to consume from partitions.

However, I do want to quickly eyeball the ConnectionClose though, if you have a stacktrace for that it might help facilitate, since that doesn't ring a bell like LinkDetach did for me. (and I'd want to make sure it comes from a similar root)
The LinkDetach failures in this case seem rooted in a known service side idle timeout prompted by inactivity > 4 minutes; which in some places should be triaged in later versions, so I'd be curious what version # you've seen this in; I should make sure this question is present in the bug template...

Thanks as always for reaching out, don't hesitate to ask for clarity if any of this ramble was unclear.

@DuneeshaFernando
Copy link
Author

Hi @KieranBrantnerMagee,
Yes, according to my understanding, the problem i encounter is a facet of the linked issue. As you have mentioned, amidst the EH receiver warnings my application continues to consume events from partitions without any visible problem. Regarding the ConnectionClose problem, I would say that nothing prints except ConnectionClose('ErrorCodes.UnknownError: Connection in an unexpected error state.') line and I don't have a stacktrace for it (If there is any way I could obtain the stacktrace, pls let me know). However, it is a commonly printed log. The versions I'm using are as follows.

  • azure-eventhub==5.1.0
  • azure-eventhub-checkpointstoreblob==1.1.0

Pls let me know if any more details are needed.

@realadeel
Copy link

Also getting the error: LinkDetach('ErrorCodes.UnknownError') after initializing the client.
Similar situation where we have checkpointing set up.

@yunhaoling yunhaoling modified the milestones: [2021] March, Backlog Feb 10, 2021
@yunhaoling yunhaoling added the Client This issue points to a problem in the data-plane of the library. label Feb 20, 2021
@yunhaoling
Copy link
Contributor

yunhaoling commented Feb 24, 2021

hey @DuneeshaFernando, @realadeel

sorry about missing the thread here.

I could provide more context:

  • The networking tracing could be turned on via logging_enable parameter when creating EventHubConsumerClient or EventHubProducerClient, e.g. EventHubConsumerClient(..., logging_enable=True)
  • LinkDetach('ErrorCodes.UnknownError') could happen due to transient errors. The error is reported from the underlying networking protocol library.
  • In the EventHub SDK, we have the retry logic which will try to re-establish the connection when encountering transient connection errors.

Those loggings are helpful for debugging, however, we're aware that it's kinda verbose nowadays.
To eliminate those verbose loggings we would need to redesign the underlying logging flow, add fine-grained configuration which would take some time.

We'll look into this when we have free cycles. I would recommend you to ignore those verbose loggings first.
Thanks for your understanding and patience in advance!

@lmazuel lmazuel added the Messaging Messaging crew label Apr 12, 2021
@lmazuel lmazuel assigned swathipil and unassigned yunhaoling May 19, 2021
@lmazuel lmazuel modified the milestones: Backlog, [2021] June May 19, 2021
@swathipil
Copy link
Member

swathipil commented May 27, 2021

Hey @DuneeshaFernando @realadeel -
Thank you for your patience on this! We've discussed changing the logging level of the uAMQP logs so that they are DEBUG/INFO level, rather than WARNING. We hope to make a decision about this in the upcoming week and implement the changes, so that the changes will be applied by the next uAMQP release.

Here is a temporary solution in the meantime to filter out the uAMQP logs that you are encountering. Add the following to the top of your code:

import logging
logging.basicConfig(level=logging.WARNING)
uamqp_logger = logging.getLogger('uamqp')
uamqp_logger.setLevel(logging.ERROR)

@yunhaoling
Copy link
Contributor

hey folks, apologize for not getting back to you sooner.

We had an internal discussion on the uamqp logging strategy and we're sorry to inform you that we decide not to change the logging level of those uamqp errors. -- It is recommended to use the built-in python logging module to suppress uamqp log warnings if needed as described by @swathipil.

import logging
uamqp_logger = logging.getLogger('uamqp')
uamqp_logger.setLevel(logging.ERROR)
# or even further fine-grained control, suppressing the warnings in uamqp.connection module
uamqp_connection_logger = logging.getLogger('uamqp.connection')
uamqp_connection_logger .setLevel(logging.ERROR)

Here's the reasoning:

  • We understand that the verbose logging is causing annoyments as the azure-eventhub sdk could recovery from certain uamqp errors via retries.

  • We revisited the uamqp logging strategy and the major concern for changing the logging level is that:

    • uamqp is a standalone amqp library which means users could build their applications directly on top of it. For those users, when errors occur and need to be handled by them, warning is the proper level of logging for indication.
  • We have thought about other options like:

    • introducing new options to configure the logging level of the uamqp library?
      • if so, then why not just use the python built-in module which is a more powerful tool for configuring loggings
    • escalating logging level internally to suppress warning logging?
      • we should not do this, logging configs should be controller by users not by the sdk.
    • can we update the retry logic in the azure-eventhub to deal with warnings?
      • since the warnings are from the uamqp while retry happens in azure-eventhub, there's little thing we could do in retry prevent warnings.

To conclude with, based on the aforementioned factors, we think it's better to give the logging control to users.

But we would also like to provide instructions in readme.md on how to configure uamqp logging with the built-in python logging module. I have created a separate issue for it: #19170.

Please let me know if you have any thoughts. We value your feedbacks and are completely open for discussion.

@yunhaoling yunhaoling added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Jun 8, 2021
openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-python that referenced this issue Jul 5, 2021
T2 vmware (Azure#15077)

* sql t2 readme config

* readme config

* Update readme.python.md

* conflient resolve

* readme t2 config

Co-authored-by: msyyc <[email protected]>
openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-python that referenced this issue Jul 5, 2021
T2 vmware (Azure#15077)

* sql t2 readme config

* readme config

* Update readme.python.md

* conflient resolve

* readme t2 config

Co-authored-by: msyyc <[email protected]>
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs Messaging Messaging crew needs-author-feedback Workflow: More information is needed from author to address the issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus uAMQP
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants