-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unblock SmallRye Health exposed routes #37352
Conversation
0f91c2e
to
2b637a8
Compare
I've tested this change for the KC project and it works as we would expect it to work: When there is no check or only asynchronous checks, the execution path doesn't touch a the Quarkus thread pool and no requests are queued. This way, we won't see a timeout in an overload situation where blocking requests are queued, and no error when load shedding on executor queue length is active. Thank you very much! |
2b637a8
to
7af7de5
Compare
I'll force push this since it seems that CI is stuck |
7af7de5
to
72711fb
Compare
This comment has been minimized.
This comment has been minimized.
@geoand, @cescoffier can you take a look please? |
Is there anyway we can add a test similar to #36977? |
f8194e0
to
4c55998
Compare
@geoand, added test with the user scenario. |
This comment has been minimized.
This comment has been minimized.
4c55998
to
85fdcb8
Compare
This comment has been minimized.
This comment has been minimized.
85fdcb8
to
ba1a3aa
Compare
Refactored the new test into a new additional-tests module because of cyclic dependency. I don't like creating a maven module just for testing but I'm not able to reproduce the issue in different way. |
This comment has been minimized.
This comment has been minimized.
fa055db
to
52153f2
Compare
Couldn't sleep about the new test module so I found a way how to avoid it :D. The new |
This looks good to me, but we'll need a +1 from @cescoffier as well |
This comment has been minimized.
This comment has been minimized.
public void testRegisterHealthOnBlockingThreadStep1() { | ||
// wait for the initial startup health call to finish | ||
try { | ||
Thread.sleep(5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea how we could use Awaitability instead of this hard coded sleep?
This is likely going to fail on slow systems and introduce delay on fast ones.
try { | ||
Thread.sleep(5000); | ||
} catch (InterruptedException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just throw the InterruptedException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't since it's override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overridden from where?
if (!inMemoryLogHandler.getRecords().isEmpty()) { | ||
LogRecord logRecord = inMemoryLogHandler.getRecords().get(0); | ||
assertEquals(Level.WARNING, logRecord.getLevel()); | ||
assertFalse(logRecord.getMessage().contains("has been blocked for"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we iterate over the whole set of messages? It might not be the last one.
static final class BlockingHealthCheck implements HealthCheck { | ||
@Override | ||
public HealthCheckResponse call() { | ||
// block for 3s which is more than allowed default blocking duration of eventloop (2s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead I would you an illegal operation such as:
Uni.createFrom().item(42).onItem().delayIt().by(Duration.ofMillis(10)).await().indefinitely();
If you are on the event loop, this operation will be rejected as you are not allowed to use await()
.
52153f2
to
188c343
Compare
@xstefank Thanks for the PR! Keycloak would be intersted in getting this into 3.7/3.8. :) |
Hello, After this change (I just updated from 3.6.2 to 3.8.2, but the last working version is 3.6.9 - also works if downgrade only the The request also doesn't return (postman keeps spinning) the stacktrace is the following:
Any hints what could have been wrong ? Edit: |
@ivivanov-bg is there any cahcne you can attach a sample application that exhibits the behavior you describe? Thanks |
I will try (not sure how much time it will take though). If anyone has hints - please share, if not - I will post back again when I have the sample application (for now - my workaround is to downgrade Thanks |
In the case of the previous bug, there was nothing the application could do, it was a subtle bug in the way Quarkus handles health checks. |
Attached is the sample app:
But it fails when run as:
|
Thanks |
@xstefank seems like you have your work cut out for you 😉 ^ |
Is this because of the custom identity provider ? |
Could very well be |
Actually - I just tested it I tried creating the |
I'm not sure why, but in your sample app |
@ivivanov-bg A quick fix on your side:
This works but I need to verify with Vert.x guys if this is something we should be handling in sr-health extension. |
No, do NOT do that, you are creating a new instance of Vert.x, unmanaged and unclosed. You need:
|
I totally missed that you're creating a single thread executor. So better would be: @Alternative
@Priority(1)
@ApplicationScoped
@Slf4j
@RequiredArgsConstructor(onConstructor_ = @Inject)
public class CustomAuthenticationMechanism implements HttpAuthenticationMechanism {
private final BasicAuthenticationMechanism basicAuth;
@Inject
Vertx vertx;
@Override
public Uni<SecurityIdentity> authenticate(RoutingContext context, IdentityProviderManager identityProviderManager) {
return basicAuth.authenticate(context, identityProviderManager)
.emitOn(command -> vertx.getOrCreateContext().runOnContext(v -> command.run()))
.onItem()
.transformToUni(securityIdentity -> {
if (securityIdentity != null && !securityIdentity.getRoles().contains("admin")) {
final String username = securityIdentity.getPrincipal().getName();
if (username == null || username.equals("test1")) {
return Uni.createFrom()
.failure(new ForbiddenException(String.format("User %s not allowed to access %s", username, context.normalizedPath())));
}
}
return Uni.createFrom().item(securityIdentity);
});
}
@Override
public Uni<ChallengeData> getChallenge(RoutingContext context) {
return basicAuth.getChallenge(context);
}
@Override
public Set<Class<? extends AuthenticationRequest>> getCredentialTypes() {
return Set.copyOf(basicAuth.getCredentialTypes());
}
} So this will either get existing or create new context (but not the whole vertx). What I still don't understand why this |
The
The second point breaks the health checks, as it expects to be called in a duplicated context (I just discussed a fallback with @xstefank). Note that it can break more than health checks, but tracing too. @sberyozkin, could we imagine a safety guard before calling the authentication mechanism that will capture the duplicated context and restore it if the user code does not do it? (Can you point me to the code calling this?) |
There is actually one part in the actual code missing in the sample app. I need to perform a DB operation to fetch some data based on the called URL. When using the suggested approach So I ended up with:
Thank you all for the help |
@ivivanov-bg You need to capture the context and then restore it: @Inject Vertx vertx;
@Override
public Uni<SecurityIdentity> authenticate(RoutingContext context, IdentityProviderManager identityProviderManager) {
Executor contextExecutor = MutinyHelper.executor(Vertx.currentContext()); // Gets an executor restoring the current context.
return basicAuth.authenticate(context, identityProviderManager)
.emitOn(Infrastructure.getDefaultExecutor()) // Switch to a worker thread
.onItem()
.transformToUni(securityIdentity -> {
if (securityIdentity != null && !securityIdentity.getRoles().contains("admin")) {
final String username = securityIdentity.getPrincipal().getName();
if (username == null || username.equals("test1")) {
return Uni.createFrom()
.failure(new ForbiddenException(String.format("User %s not allowed to access %s", username, context.normalizedPath())));
}
}
return Uni.createFrom().item(securityIdentity);
})
.emitOn(contextExecutor); // Switch back to the context
} |
I don't think it's a good idea to accept this can happen. If user is really going to do that, he will know what he is doing because he is an expert I suppose. For majority of users I suggest to do not deal with threads (it will seem a low level to many of users), but instead work with API. I think the example should look like this: @Inject BlockingSecurityExecutor blockingExecutor;
@Override
public Uni<SecurityIdentity> authenticate(RoutingContext context, IdentityProviderManager identityProviderManager) {
return blockingExecutor
.executeBlocking(() -> basicAuth.authenticate(context, identityProviderManager).await().indefinitely())
.onItem()
.transformToUni(securityIdentity -> {
if (securityIdentity != null && !securityIdentity.getRoles().contains("admin")) {
final String username = securityIdentity.getPrincipal().getName();
if (username == null || username.equals("test1")) {
return Uni.createFrom()
.failure(new ForbiddenException(String.format("User %s not allowed to access %s", username, context.normalizedPath())));
}
}
return Uni.createFrom().item(securityIdentity);
});
} We should probably document this if there was an agreement on the solution, because giving advice inside some PR has almost no impact on others in same situation. Also if there is a common case emitting |
+100 |
Fixes #35099
Issue #36977 caused by the original fix is addressed with the help of @computerlove (big thanks Marvin!).
For a quick reference, the issue was caused by the checks still being initialized in the blocking thread (through scheduler) which initialized them to run on wrong thread. This fix contains fixes to extensions/smallrye-health/runtime/src/main/java/io/quarkus/smallrye/health/runtime/QuarkusAsyncHealthCheckFactory.java that will mitigate this scenario and make sure that: