Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ServiceBusSessionReceiverClient.acceptNextSession returns null #28337

Closed
2 of 3 tasks
lidkowiak opened this issue Apr 15, 2022 · 13 comments
Closed
2 of 3 tasks

[BUG] ServiceBusSessionReceiverClient.acceptNextSession returns null #28337

lidkowiak opened this issue Apr 15, 2022 · 13 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Bus

Comments

@lidkowiak
Copy link

Describe the bug

ServiceBusSessionReceiverClient acceptNextSession returns sometimes null instead of throwing IllegalStateException exception

Exception or Stack Trace
The exception log and stack trace from our code (not directly from SDK)

java.lang.NullPointerException: Cannot invoke "com.azure.messaging.servicebus.ServiceBusReceiverClient.receiveMessages(int, java.time.Duration)" because "serviceBusReceiverClient" is null at <OUR_PACKAGE>.servicebus.processor.SessionsHandler.processSession(SessionsHandler.java:100) at <OUR_PACKAGE>.servicebus.processor.SessionsHandler.processSessions(SessionsHandler.java:73) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833)

Around same time we got following logs:

{"az.sdk.message":"","exception":"Cannot create receive link from a closed session., errorContext[NAMESPACE: <OUR SB NS>.servicebus.windows.net. ERROR CONTEXT: N/A, PATH: <OUR TOPIC>/subscriptions/<OUR-SUBSCRIPTION>]","connectionId":"MF_387eb6_1649755334607","sessionName":"<OUR TOPIC>/subscriptions/<OUR-SUBSCRIPTION>","entityPath":"<OUR TOPIC>/subscriptions/<OUR-SUBSCRIPTION>","linkName":"session-_349db7_1649889915864"}

To Reproduce

Unknown

Code Snippet

processSession(serviceBusSessionReceiverClient.acceptNextSession());

Expected behavior

Exception should be thrown.

Screenshots

Not related

Setup (please complete the following information):

  • Library/Libraries: com.azure:azure-messaging-servicebus:7.7.0
  • Java version: 17
  • Frameworks: Spring Boot 2.6.x

Additional context
Not related

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Apr 15, 2022
@joshfree joshfree added Service Bus Client This issue points to a problem in the data-plane of the library. labels Apr 15, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 15, 2022
@joshfree joshfree added bug This issue requires a change to an existing behavior in the product in order to be resolved. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Apr 15, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 15, 2022
@joshfree
Copy link
Member

Hi @lidkowiak thank you for filing this GitHub issue on ServiceBusSessionReceiverClient.acceptNextSession. Someone from the Service Bus SDK for Java development team will follow up with you shortly.

/cc @ki1729 @Azure/azsdk-sb-java

@ki1729
Copy link
Contributor

ki1729 commented Apr 15, 2022

@lidkowiak Can you please help clear up a couple of things for me?

  1. In the redacted exception stacktrace, it is referring to ServiceBusReceiverClient but in your code snippet, you are using a serviceBusSessionReceiverClient. I am not sure which client is actually expected to be initalized here.
  2. If possible, can you please share a larger code snippet than the one you have mentioned in the bug description? With the data I have right now, it is hard to determine where the null is being returned from.

@ki1729 ki1729 added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Apr 15, 2022
@piotr-napadlek
Copy link

@ki1729 The null is returned by the call to this method:

We expect that the SessionBusSessionReceiverClient.acceptNextSession either returns us initialized instance of ServiceBusReceiver client or throws exception that we handle. For rare cases this method returns us null, which is undocumented and we believe is a bug.

@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Apr 25, 2022
@ghost
Copy link

ghost commented Apr 25, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@lidkowiak
Copy link
Author

Dear @msftbot, we are waiting for the author's feedback.

@ghost ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. labels Apr 25, 2022
@ki1729
Copy link
Contributor

ki1729 commented May 5, 2022

@piotr-napadlek Thanks for the clarification. As you have mentioned, acceptNextSession() method should never return null here. We have stress tests running which seems to be creating this client without any issues. Can you please provide some sample code, incuding the configuration used for the session receiver - so that I can try to repro this on my end.

@ki1729 ki1729 added the needs-author-feedback Workflow: More information is needed from author to address the issue. label May 5, 2022
@lidkowiak
Copy link
Author

@ghost ghost removed the needs-author-feedback Workflow: More information is needed from author to address the issue. label May 6, 2022
@ki1729
Copy link
Contributor

ki1729 commented May 13, 2022

Thanks @lidkowiak! This helps. I have had tests running this code (removed - messageHandler) for >24 hours now. acceptNextSession() hasn't returned null yet. I will leave it running over the weekend just to be sure.

Just to get as close a test environment as possible, can you please share the following details about the servicebus namespace that you use (I have indicated the value I am using in my test run at the end):

  1. Standard or Premium tier - my value = Standard
  2. Number of topics - my value = 5
  3. Number of subscriptions in the topic used with SessionsHandler - my value = 2
  4. Number of sessions for the subscription used with SessionsHandler - my value = 10
  5. Config value for serviceBusSessionTopicConfig.maxConcurrentSessions() - my value = 5
  6. Config value for serviceBusSessionTopicConfig.prefetchCount() - my value = 1

@ki1729 ki1729 added needs-author-feedback Workflow: More information is needed from author to address the issue. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels May 13, 2022
@lidkowiak
Copy link
Author

Hi,
here is our setup:

  1. Standard
  2. 10
  3. 2 subscription per topic
  4. SessionId is per tenant in multi tenant application, we have up to 1500 sessionId, it varies over time
  5. It depends on the topic, we use value between 5 and 15
  6. 1

Are you able to verify whether it could be caused by maintenance on Service Bus?
Similar case by for Event Hub #24224 (comment)

@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label May 20, 2022
@ghost ghost removed the needs-author-feedback Workflow: More information is needed from author to address the issue. label May 20, 2022
@piotr-napadlek
Copy link

piotr-napadlek commented May 23, 2022

Hi, not sure if you are still monitoring but the problem does not occur daily. We have spike in exceptions caused by this null pointer at random times, around once a week. We've observed it again this weekend 5/22/2022, 1:01:59.000 PM UTC. It's usually connected to another kind of exceptions thrown by the library:


com.azure.core.amqp.exception.AmqpException: Cannot create receive link from a closed session., errorContext[NAMESPACE: <our namespace>.servicebus.windows.net. ERROR CONTEXT: N/A, PATH: <topic>/subscriptions/<subscription>]
at com.azure.core.amqp.implementation.ReactorSession.createConsumer(ReactorSession.java:360)
at com.azure.messaging.servicebus.implementation.ServiceBusReactorSession.createConsumer(ServiceBusReactorSession.java:208)
at com.azure.messaging.servicebus.implementation.ServiceBusReactorSession.createConsumer(ServiceBusReactorSession.java:107)
at com.azure.messaging.servicebus.implementation.ServiceBusReactorAmqpConnection.lambda$createReceiveLink$4(ServiceBusReactorAmqpConnection.java:213)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:125)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$1(TracingSubscriber.java:58)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:58)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:137)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:127)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$1(TracingSubscriber.java:58)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:58)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:137)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:249)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$1(TracingSubscriber.java:58)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:58)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.complete(MonoIgnoreThen.java:292)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onNext(MonoIgnoreThen.java:187)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onSubscribe(MonoIgnoreThen.java:134)
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:240)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:203)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.SerializedSubscriber.onComplete(SerializedSubscriber.java:146)
at reactor.core.publisher.SerializedSubscriber.onComplete(SerializedSubscriber.java:146)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.onComplete(FluxTimeout.java:234)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.MonoNext$NextSubscriber.onComplete(MonoNext.java:102)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.FluxFilterFuseable$FilterFuseableSubscriber.onComplete(FluxFilterFuseable.java:171)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onComplete(FluxHide.java:147)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onComplete(FluxHide.java:147)
at reactor.core.publisher.FluxReplay$SizeBoundReplayBuffer.replayNormal(FluxReplay.java:847)
at reactor.core.publisher.FluxReplay$SizeBoundReplayBuffer.replay(FluxReplay.java:944)
at reactor.core.publisher.FluxReplay$ReplaySubscriber.onComplete(FluxReplay.java:1351)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:142)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.FluxDistinctUntilChanged$DistinctUntilChangedSubscriber.onComplete(FluxDistinctUntilChanged.java:173)
at reactor.core.publisher.FluxReplay$SizeBoundReplayBuffer.replayNormal(FluxReplay.java:847)
at reactor.core.publisher.FluxReplay$SizeBoundReplayBuffer.replay(FluxReplay.java:944)
at reactor.core.publisher.ReplayProcessor.tryEmitComplete(ReplayProcessor.java:465)
at reactor.core.publisher.SinkManySerialized.tryEmitComplete(SinkManySerialized.java:64)
at reactor.core.publisher.InternalManySink.emitComplete(InternalManySink.java:68)
at com.azure.core.amqp.implementation.handler.Handler.close(Handler.java:155)
at com.azure.core.amqp.implementation.handler.SessionHandler.onSessionFinal(SessionHandler.java:141)
at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:155)
at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:291)
at com.azure.core.amqp.implementation.ReactorExecutor.run(ReactorExecutor.java:91)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Suppressed: java.lang.Exception: #block terminated with an error
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:139)
at reactor.core.publisher.Mono.block(Mono.java:1731)
at com.azure.messaging.servicebus.ServiceBusSessionReceiverClient.acceptNextSession(ServiceBusSessionReceiverClient.java:86)
at <our_package>.SessionsHandler.processSessions(SessionsHandler.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more

@piotr-napadlek
Copy link

Any progress/information about this one? Last night we've had alerts firing because of thousands of exceptions caused by this issue.

@lmolkova
Copy link
Member

@piotr-napadlek @lidkowiak we're going to mitigate this issue with #35139.

Sorry, it's been a while since you reported it, but if you still see the issue, could you please share logs (info or verbose) with us so we can see if we can improve the solution further? Thank you!

Copy link

Hi @lidkowiak, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 15, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Bus
Projects
None yet
Development

No branches or pull requests

5 participants