Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC health endpoint is always succesfull on OCP and leads to "Type already defined" exception in dev mode #33219

Closed
fedinskiy opened this issue May 9, 2023 · 16 comments

Comments

@fedinskiy
Copy link
Contributor

Describe the bug

I have an application, which is deployed on Openshift. After I added GRPC health endpoint[1] to the app, I noticed, that health probes are always successful, even when endpoint should be throwing exceptions. When the application is started in dev mode locally, it throws exception java.lang.IllegalStateException: Type already defined: grpc.health.v1.HealthCheckRequest and doesn't not serve any endpoints. OpenShift logs for the same app doesn't contain the error.

[1] #32113

Expected behavior

Application should start in dev mode without exceptions. Health probes should fail on exceptions/error codes

Actual behavior

Startup/readiness/liveness probes on Openshift are successful. In dev mode, the application throws this exception:

2023-05-09 11:46:43,349 ERROR [io.qua.run.Application] (Quarkus Main Thread) Failed to start application (with profile [dev]): java.lang.IllegalStateException: Type already defined: grpc.health.v1.HealthCheckRequest
	at io.quarkus.grpc.runtime.reflection.GrpcServerIndex.processType(GrpcServerIndex.java:144)
	at io.quarkus.grpc.runtime.reflection.GrpcServerIndex.processFileDescriptor(GrpcServerIndex.java:115)
	at io.quarkus.grpc.runtime.reflection.GrpcServerIndex.<init>(GrpcServerIndex.java:64)
	at io.quarkus.grpc.runtime.reflection.ReflectionService.<init>(ReflectionService.java:32)
	at io.quarkus.grpc.runtime.GrpcServerRecorder.buildServer(GrpcServerRecorder.java:494)
	at io.quarkus.grpc.runtime.GrpcServerRecorder.devModeStart(GrpcServerRecorder.java:250)
	at io.quarkus.grpc.runtime.GrpcServerRecorder.initializeGrpcServer(GrpcServerRecorder.java:119)
	at io.quarkus.deployment.steps.GrpcServerProcessor$initializeServer1019312497.deploy_0(Unknown Source)
	at io.quarkus.deployment.steps.GrpcServerProcessor$initializeServer1019312497.deploy(Unknown Source)
	at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
	at io.quarkus.runtime.Application.start(Application.java:101)
	at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:111)
	at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
	at io.quarkus.runtime.Quarkus.run(Quarkus.java:44)
	at io.quarkus.runtime.Quarkus.run(Quarkus.java:124)
	at io.quarkus.runner.GeneratedMain.main(Unknown Source)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at io.quarkus.runner.bootstrap.StartupActionImpl$1.run(StartupActionImpl.java:104)
	at java.base/java.lang.Thread.run(Thread.java:833)

How to Reproduce?

  1. Create an application.
  2. Add quarkus-openshift and quarkus-grpc dependencies
  3. Add this health.proto file to src/main/proto/:
syntax = "proto3";

option java_multiple_files = true;
option java_package = "io.quarkus.example";

package grpc.health.v1;

message HealthCheckRequest {
    string service = 1;
}

message HealthCheckResponse {
    enum ServingStatus {
        UNKNOWN = 0;
        SERVING = 1;
        NOT_SERVING = 2;
        SERVICE_UNKNOWN = 3;  // Used only by the Watch method.
    }
    ServingStatus status = 1;
}

service HealthService {
    rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
    rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}
  1. Add this service:
import io.quarkus.example.HealthCheckRequest;
import io.quarkus.example.HealthCheckResponse;
import io.quarkus.example.HealthService
import io.quarkus.grpc.GrpcService;
import io.smallrye.mutiny.Multi;
import io.smallrye.mutiny.Uni;

@GrpcService
public class GrpcHealthService implements HealthService {

    @Override
    public Uni<HealthCheckResponse> check(HealthCheckRequest request) {
        return Uni.createFrom().failure(new RuntimeException("Error!"));
    }

    @Override
    public Multi<HealthCheckResponse> watch(HealthCheckRequest request) {
        return Multi.createFrom().failure(new RuntimeException("Error!"));
    }
}
  1. Run in devmode: mvn clean quarkus:dev. This fails
  2. Add deployment options for Openshift into application.properties :
quarkus.openshift.readiness-probe.grpc-action=9000:grpc.health.v1.HealthService
quarkus.openshift.startup-probe.grpc-action=9000:grpc.health.v1.HealthService
quarkus.openshift.liveness-probe.grpc-action=9000:grpc.health.v1.HealthService
quarkus.kubernetes-client.trust-certs=true
  1. Deploy on openshift: mvn clean install -Dquarkus.kubernetes.deploy=true
  2. Check status:
$ oc get pods
NAME                         READY   STATUS      RESTARTS   AGE
code-with-quarkus-1-build    0/1     Completed   0          12m
code-with-quarkus-1-deploy   0/1     Completed   0          12m
code-with-quarkus-1-wkpg8    1/1     Running     0          11m

$ oc describe pod code-with-quarkus-1-wkpg8
Name:         code-with-quarkus-1-wkpg8
Namespace:    fd-check
<...>
Start Time:   Tue, 09 May 2023 09:57:28 +0200
Labels:       app.kubernetes.io/managed-by=quarkus
              app.kubernetes.io/name=code-with-quarkus
              app.kubernetes.io/version=1.0.0-SNAPSHOT
              app.openshift.io/runtime=quarkus
              deployment=code-with-quarkus-1
              deploymentconfig=code-with-quarkus
Controlled By:  ReplicationController/code-with-quarkus-1
Containers:
  code-with-quarkus:
<...>
    Ports:          9000/TCP, 8443/TCP, 8080/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 09 May 2023 09:57:31 +0200
    Ready:          True
    Restart Count:  0
    Liveness:       grpc <pod>:9000 grpc.health.v1.HealthService delay=5s timeout=10s period=10s #success=1 #failure=3
    Readiness:      grpc <pod>:9000 grpc.health.v1.HealthService delay=5s timeout=10s period=10s #success=1 #failure=3
    Startup:        grpc <pod>:9000 grpc.health.v1.HealthService delay=5s timeout=10s period=10s #success=1 #failure=3
    Environment:
      KUBERNETES_NAMESPACE:  fd-check (v1:metadata.namespace)
      JAVA_APP_JAR:          /deployments/quarkus-run.jar
    Mounts:
   <...>
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 

QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age   From               Message
  ----    ------          ----  ----               -------
  Normal  Scheduled       12m   default-scheduler  Successfully assigned fd-check/code-with-quarkus-1-wkpg8 to <...>
  Normal  AddedInterface  12m   multus             Add eth0 <...> from openshift-sdn
  Normal  Pulling         12m   kubelet            Pulling image "<...>"
  Normal  Pulled          12m   kubelet            Successfully pulled image <...>
  Normal  Created         12m   kubelet            Created container code-with-quarkus
  Normal  Started         12m   kubelet            Started container code-with-quarkus

A repriducer can be found here: https://github.com/fedinskiy/reproducer/tree/grpc-probes

Output of uname -a or ver

4.18.0-425.19.2.el8_7.x86_64

Output of java -version

11.0.18, vendor: Red Hat, Inc

GraalVM version (if different from Java)

No response

Quarkus version or git rev

3.0.1.Final

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)

Additional information

$ oc version
Client Version: 4.11.0-202208020706.p0.g7075089.assembly.stream-7075089
Kustomize Version: v4.5.4
Kubernetes Version: v1.25.7+eab9cc9
@quarkus-bot
Copy link

quarkus-bot bot commented May 9, 2023

/cc @Sgitario (kubernetes), @alesj (grpc), @cescoffier (grpc), @geoand (kubernetes,openshift), @iocanel (kubernetes,openshift), @jmartisk (health), @xstefank (health)

@Sgitario
Copy link
Contributor

Sgitario commented May 9, 2023

About the first issue that you described: Run in devmode: mvn clean quarkus:dev. This fails. This is because the package name is wrong in the health.proto. Using health.proto worked for me:

syntax = "proto3";

option java_multiple_files = true;

package acme;

...

And about the second issue Health probes should fail on exceptions/error codes. I think this is an OpenShift question. Are you sure your OpenShift instance supports HTTP2 + gRPC probes? I could not find any relevant information about how to set up OpenShift to enable it, so I can't help much here. Yet Health probes should fail on exceptions/error codes seems very wrong to me as well. I think we should ask for some OpenShift expert or support to help/clarify if this feature is supported in OpenShift.

@fedinskiy
Copy link
Contributor Author

@Sgitario the manual for grpc health probes[1] explicitly suggests grpc.health.v1 and option java_package = "io.quarkus.example"; works for other services (eg helloworld.proto in the reproducer). Does health service have some additional requirements? If so, It would be nice to have them documented.

K8s docs[2] say, that GRPC probes are enabled by default starting with 1.24 and my cluster uses 1.25. Unless OCP folks disabled the feature, this should be working fine. In addition, if I change service name (eg quarkus.openshift.readiness-probe.grpc-action=9000:grpc.health.v1.HealthServic) then I see this is pod describe: Warning Unhealthy 10s (x5 over 39s) kubelet Readiness probe failed: service unhealthy (responded with "UNKNOWN").

[1] https://github.com/grpc/grpc/blob/master/doc/health-checking.md
[2] https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-graduated-or-deprecated-features

@cescoffier
Copy link
Member

Quarkus implements the health service automatically.

@fedinskiy
Copy link
Contributor Author

@cescoffier how does it distinguish between GRPC and HTTP-based endpoints? Can we disable/overload the automatic implementation? If it it implemented automatically, then why there are options from the quarkus.openshift.N-probe.grpc-action family?

@cescoffier
Copy link
Member

We only implement and expose the gRPC health service. I don't believe the probes use it directly.

@fedinskiy
Copy link
Contributor Author

Could you explain this a bit? I have GRPC-based health service and I want OCP probes to use it. It looks like the new Quarkus feature[1] allows me to do exactly that. Is there some special considerations which I should be aware of?

[1] #32113

@Sgitario
Copy link
Contributor

Sgitario commented May 9, 2023

@fedinskiy I could play around with your reproducer and gRPC probes. Let's go step by step:

The first thing you need to address is the incompatibility package name:

@Sgitario the manual for grpc health probes[1] explicitly suggests grpc.health.v1 and option java_package = "io.quarkus.example"; works for other services (eg helloworld.proto in the reproducer).

Having the two packages are obviously wrong and I don't think we need to document this. The right one is grpc.health.v1, so use that.

The second thing is that the gRPC port (9000) is not exposed in the container and in the service. This is because you haven't implemented any bindable gRPC service (the package grpc.health.v1 is being ignored on purpose). So, you need to add an additional gRPC service, a simple hello world would do it.

The third thing is that you're using the gRPC service instead of the Kubernetes services in the gRPC probes:

quarkus.openshift.readiness-probe.grpc-action=9000:grpc.health.v1.HealthService

It should be:

quarkus.openshift.readiness-probe.grpc-action=9000:<name of the generated service>

The fourth thing is that as @cescoffier stated, the health service is being configured by default. If you want to provide your custom one, you need to disable:

quarkus.grpc.server.health.enabled=false
quarkus.grpc.server.grpc-health.enabled=false

After doing these four things, your deployment will still fail with:

Startup probe failed: error: this server does not implement the grpc health protocol (grpc.health.v1.Health): Method not found: grpc.health.v1.Health/Check

This is because your health.proto is wrong. For inspiration, you could use the one from Quarkus:

syntax = "proto3";

package grpc.health.v1;

message HealthCheckRequest {
  string service = 1;
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
  }
  ServingStatus status = 1;
}

service Health {
  rpc Check(HealthCheckRequest) returns (HealthCheckResponse);

  rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}

And then you can implement your custom logic. And this will work fine.

Now, the possible improvements:

  • Automatically populate the gRPC action with the gRPC port and the generated service name.

In this case, we would need to add a new property:

quarkus.openshift.readiness-probe.grpc-enabled=true

Or perhaps, use an auto text:

quarkus.openshift.readiness-probe.grpc-action=auto

In any case, this should be reported by a separate ticket, so we could discuss the solutions further.

  • Using the generated gRPC Health service didn't work for me. It gave me this error:
Startup probe failed: service unhealthy (responded with "UNKNOWN")

Note that I didn't use the Quarkus Smallrye health extension.

The problem seems to be that the gRPC service is not registered in the gRPC storage. @cescoffier any ideas about how this should work / or if this is meant to work without the Quarkus Smallrye health extension?

But again, I think this should also be reported by a separate ticket.

  • Taking into account that these steps are not straightforward, perhaps we could add a section in the gRPC documentation or instead, this is food for a Quarkus blog post. Wdyt? @cescoffier

@fedinskiy
Copy link
Contributor Author

fedinskiy commented May 9, 2023

@Sgitario could you elaborate on health.proto being wrong? Since I can see the exception even when the only difference is in the package name:

$ diff -w /home/fedinskiy/code/quarkus/reproducer/src/main/proto/health.proto /home/fedinskiy/code/quarkus/quarkus/extensions/grpc/stubs/src/main/proto/health/v1/health.proto
3c3
< package org.acme;
---
> package grpc.health.v1;

Changing the package name leads to

java.lang.NoClassDefFoundError: grpc/health/v1/HealthOuterClass$HealthCheckResponse$ServingStatus
	at io.quarkus.grpc.runtime.health.GrpcHealthStorage.<init>(GrpcHealthStorage.java:34)

I also suggest to document the prohibition of grpc.health.v1 and io.grpc.reflection or (preferably) to add a warning in logs, since this directly contradicts the official guide. Same goes for implicitly enabled quarkus.grpc.server.health and
quarkus.grpc.server.grpc-health.

@cescoffier
Copy link
Member

These proto files are defined by gRPC, you cannot change them.

@Sgitario
Copy link
Contributor

@Sgitario could you elaborate on health.proto being wrong? Since I can see the exception even when the only difference is in the package name:

...

The health.proto you copied in the issue description is naming the service HealthService and it should be Health.

Changing the package name leads to

java.lang.NoClassDefFoundError: grpc/health/v1/HealthOuterClass$HealthCheckResponse$ServingStatus
	at io.quarkus.grpc.runtime.health.GrpcHealthStorage.<init>(GrpcHealthStorage.java:34)

Did you do the changes in step 4 above? These ones:

quarkus.grpc.server.health.enabled=false
quarkus.grpc.server.grpc-health.enabled=false

GrpcHealthStorage should not be injected if quarkus.grpc.server.health.enabled is disabled. So, it's likely to be a collision with your custom health service and the one provided by Quarkus that should be disabled.

I also suggest to document the prohibition of grpc.health.v1 and io.grpc.reflection or (preferably) to add a warning in logs, since this directly contradicts the official guide.

I'm ok with the warning, but why do you say that it directly contradicts the official guide?

Same goes for implicitly enabled quarkus.grpc.server.health and quarkus.grpc.server.grpc-health.

I don't understand what you want to document/clarify here.

@fedinskiy
Copy link
Contributor Author

@Sgitario yes, I changed files in the reproducer like that (see also updated reproducer):

application.properties:

quarkus.openshift.readiness-probe.grpc-action=9000:org.acme.Health
quarkus.openshift.startup-probe.grpc-action=9000:org.acme.Health
quarkus.openshift.liveness-probe.grpc-action=9000:org.acme.Health
quarkus.kubernetes-client.trust-certs=true
quarkus.grpc.server.health.enabled=false
quarkus.grpc.server.grpc-health.enabled=false

health.proto:

syntax = "proto3";

package org.acme;

message HealthCheckRequest {
    string service = 1;
}

message HealthCheckResponse {
    enum ServingStatus {
        UNKNOWN = 0;
        SERVING = 1;
        NOT_SERVING = 2;
    }
    ServingStatus status = 1;
}

service Health {
    rpc Check(HealthCheckRequest) returns (HealthCheckResponse);

    rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}

GrpcHealthService.java:

import io.quarkus.grpc.GrpcService;
import io.smallrye.mutiny.Multi;
import io.smallrye.mutiny.Uni;

@GrpcService
public class GrpcHealthService implements Health {

    @Override
    public Uni<HealthOuterClass.HealthCheckResponse> check(HealthOuterClass.HealthCheckRequest request) {
        return Uni.createFrom().failure(new RuntimeException("Error!"));
    }

    @Override
    public Multi<HealthOuterClass.HealthCheckResponse> watch(HealthOuterClass.HealthCheckRequest request) {
        return Multi.createFrom().failure(new RuntimeException("Error!"));
    }
}

After deploy it leads to Startup probe failed: error: this server does not implement the grpc health protocol (grpc.health.v1.Health): Method not found: grpc.health.v1.Health/Check

It looks like grpc health probes require definition of service to have package grpc.health.v1 which is not allowed by Quarkus.

@Sgitario
Copy link
Contributor

It looks like grpc health probes require definition of service to have package grpc.health.v1 which is not allowed by Quarkus.

Quarkus does allow to use of this package name. The only thing is that it's not considered bindable (which I think it makes sense), and not being bindable only means that the port 9000 won't be exposed by the K8s service. If you add another gRPC service as I said in step two above, you'll see the gRPC port as a container port.

@fedinskiy
Copy link
Contributor Author

@Sgitario there is another service already (see file src/main/proto/helloworld.proto in the reproducer), and this doesn't help.

If my interpreattion is correct, than OCP grpc probes require service to be named grpc.health.v1.Health, but user-defined service with such name can not be binded to 9000 by quarkus. This means, that it is not possible to have user-defined health probes in practice, right? If it is possible, could you please tell me, which changes should be made to the service?

Sgitario added a commit to Sgitario/quarkus that referenced this issue May 17, 2023
Plus, I'm adding a new guide about the usage of gRPC services in Kubernetes.
Relates to quarkusio#33219 (comment)
@Sgitario
Copy link
Contributor

FYI: I've provided a pull request to add a new property that automatically populates the correct values of the gRPC probes: #33437

Sgitario added a commit to Sgitario/quarkus that referenced this issue May 24, 2023
Plus, I'm adding a new guide about the usage of gRPC services in Kubernetes.
Relates to quarkusio#33219 (comment)
@Sgitario
Copy link
Contributor

After adding the documentation and the new property to automatically populate the correct values, I think we can close now this pull request.
Note that adding custom gRPC Health endpoints is not supported since the gRPC extension is also doing so.
Feel free to reopen it if you think there is something more we could do!

michelle-purcell pushed a commit to michelle-purcell/quarkus that referenced this issue May 29, 2023
Plus, I'm adding a new guide about the usage of gRPC services in Kubernetes.
Relates to quarkusio#33219 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants