Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server Sent Events delaying 200 response #22762

Closed
nachogiljaldo opened this issue Jan 10, 2022 · 29 comments · Fixed by #23840
Closed

Server Sent Events delaying 200 response #22762

nachogiljaldo opened this issue Jan 10, 2022 · 29 comments · Fixed by #23840

Comments

@nachogiljaldo
Copy link
Contributor

Describe the bug

I am on 2.6.1.Final. I have an endpoint that is an SSE endpoint. The status code is only returned after the first element is produced.

This was not the case in 1.x and can lead to timeouts on clients.

Expected behavior

I would expect a 200 to be returned immediately (provided everything worked as expected).

Actual behavior

The status code, headers, etc. is not returned until the moment the first item is produced.

How to Reproduce?

getting-started.zip

Output of uname -a or ver

Linux jose-buguroo 5.11.0-41-generic #45-Ubuntu SMP Fri Nov 5 11:37:01 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

openjdk version "14.0.2" 2020-07-14 OpenJDK Runtime Environment (build 14.0.2+12-46) OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.6.1.Final

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f) Maven home: /home/jose/.m2/wrapper/dists/apache-maven-3.6.3-bin/1iopthnavndlasol9gbrbg6bf2/apache-maven-3.6.3 Java version: 14.0.2, vendor: Oracle Corporation, runtime: /home/jose/.sdkman/candidates/java/14.0.2-open Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "5.11.0-41-generic", arch: "amd64", family: "unix"

Additional information

From what I see SseUtil.send is producing the response the first time an element is produced. Therefore, until the first element is generated, no call to it happens and no response is flushed. I see this could have to do with PublisherResponseHandler suspending the context. I wonder if an option could be to have that class handle SSE slightly different and generate a status code.

I see that adding:

requestContext.serverResponse().setStatusCode(200)
requestContext.serverResponse().setChunked(true)
requestContext.serverResponse().write(new byte[]{})

to PublisherResponseHandler works as expected (i.e. we get the status code, headers, etc. and then the events get produced as they come). Any reason you can think of not to do that?

@nachogiljaldo nachogiljaldo added the kind/bug Something isn't working label Jan 10, 2022
@nachogiljaldo
Copy link
Contributor Author

Besides, is there any workaround you can think so that I can inject that behavior somehow? I tried to create a bean extending ServerRestHandler but no dice.

@quarkus-bot
Copy link

quarkus-bot bot commented Jan 10, 2022

/cc @FroMage, @geoand, @stuartwdouglas

@geoand
Copy link
Contributor

geoand commented Jan 10, 2022

Besides, is there any workaround you can think so that I can inject that behavior somehow? I tried to create a bean extending ServerRestHandler but no dice.

Nope, there is no way for an application to control this behavior.

Any reason you can think of not to do that?

I'll leave it up to @FroMage to decide about this

@gsmet
Copy link
Member

gsmet commented Jan 11, 2022

I'm no SSE expert, just a user, but as a user I would expect the 200 to come right away and then receive the events.

Maybe @cescoffier has an opinion too as he played quite a lot with this.

@gsmet gsmet added this to the 2.7 - main milestone Jan 11, 2022
@cescoffier
Copy link
Member

The current behavior is the expected one (at least, I think).

First I think there is a misleading wording. The status is necessarily sent BEFORE the first item, or it would not be a correct Http frame.

The problem with SSE is propagating failures. By delaying the status until we get the first item, we can produce a 400/500 response if, instead of an item, we get a failure. Also, if we get a completion event, a 204 respond should (in theory) be produced.

If we do not wait, and produce a 200 immediately, we cannot handle the ´failure' and ´completion' event cases. Nothing prevent failures to happen afterwards, but we would have sent a few items first.

@geoand
Copy link
Contributor

geoand commented Jan 11, 2022

So should we close this?

@nachogiljaldo
Copy link
Contributor Author

nachogiljaldo commented Jan 11, 2022

The problem with SSE is propagating failures. By delaying the status until we get the first item, we can produce a 400/500 response if, instead of an item, we get a failure. Also, if we get a completion event, a 204 respond should (in theory) be produced.

I admittedly haven't read the SSE protocol description (in fact I am trying to find it) but I don't seem to be able to :/

However, the examples provided, don't really make sense in my mind, let's see why:

  • 500: you can get an error it when getting the first element, when getting the second, etc. I am not sure why the 1st should be special and result on a 500. IMHO here 500s would make sense if you get an error subscribing to whatever produces the items not when generating one item resulted on an error (that bears the question what should be done in that case!).
  • 400: my understanding is (and we're doing it in our API) is that 400 here would be before the first element is produced, in fact, because the request is wrong, no element will be ever produced?
  • 204: ok, I could take this one as valid if there was never a produced event

What concerns me the most is that the time until the first event comes could be arbitrarily long and cause timeouts, that can cause timeouts with some libraries. In the extreme case, an item might be never produced.

Not sure if my reasoning makes sense, also, if you know where to find the description of SSE protocol description, that would definitively help finding out the intended behavior.

@nachogiljaldo
Copy link
Contributor Author

nachogiljaldo commented Jan 11, 2022

A data point that could prove me wrong, but seems spring-boot handles it the same way: at least it behaves that way on https://github.com/vueda/spring-sse-example

edit: the same way --> the same way quarkus does it now

@kdubb
Copy link
Contributor

kdubb commented Jan 12, 2022

I agree that the status should be sent ASAP and we shouldn't be waiting for a first message.

@cescoffier It seems we could easily find a middle ground here. I would think that any resource method that has returned a Multi or Flow or equivalently setup JAX-RS's asynchronous & SSE systems should be considered a 200 OK. This allows the resource method to prepare things, and possibly throw an exception for 400, 401, etc. In all other cases a 200 should be assumed and sent.

Basically by the time control returns back to RestEasy, the resource had their chance to set things up and possible report errors, if not it should be a successful "setup".

WRT to the 204, what is the use? SSE is designed for long lived connections that are expecting streams of events. What would be the harm in sending a "200", then not sending any events and finally closing the connection. IMHO, as an SSE client, this is what I would expect not a 204; that seems like some kind of premature optimization that is wholly unnecessary.

@kdubb
Copy link
Contributor

kdubb commented Jan 12, 2022

Here's the spec, it's pretty simple. https://html.spec.whatwg.org/multipage/server-sent-events.html#dispatchMessage. There isn't anything about the specific topic of when to send the status code from the server.

@kdubb
Copy link
Contributor

kdubb commented Jan 12, 2022

According to the spec. it requires an OPEN event and a connection announcement to be made at the start of each event stream.

I cannot see how RestEasy should be waiting any period of time beyond what is absolutely necessary if the client is expected to be sending an OPEN event when the connection is made.

@nachogiljaldo
Copy link
Contributor Author

What would be the harm in sending a "200", then not sending any events and finally closing the connection.

Furthermore, it might be impossible in many cases to predict whether you will be returning an event ever.

I cannot see how RestEasy should be waiting any period of time beyond what is absolutely necessary if the client is expected to be sending an OPEN event when the connection is made.

Just for my understanding @kdubb , are you talking about RestEasy as client or as server here?

@kdubb
Copy link
Contributor

kdubb commented Jan 13, 2022

I'm only ever referring to RestEasy server here. By "client" in the above quote I meant an EventSource as defined by the WHATWG Spec.

@gsmet gsmet removed this from the 2.7 - main milestone Jan 18, 2022
@FroMage
Copy link
Member

FroMage commented Jan 19, 2022

I seem to recall the spec said we had to send the response ASAP, and that we had something that did sent it straight away when I wrote it. Lemme find it.

@FroMage
Copy link
Member

FroMage commented Jan 19, 2022

Yeah, the spec mandates this in https://quarkus.io/specs/jaxrs/2.1/index.html#sse_pipeline:

For compatibility purposes, implementations MUST initiate processing of an SSE response when either the first message is sent or when the resource method returns, whichever happens first

SseEventSinkImpl even has this:

    public void sendInitialResponse(ServerHttpResponse response) {
        if (!response.headWritten()) {
            SseUtil.setHeaders(context, response);
            // send the headers over the wire
            context.suspend();
            response.write(EMPTY_BUFFER, new Consumer<Throwable>() {
                @Override
                public void accept(Throwable throwable) {
                    if (throwable == null) {
                        context.resume();
                    } else {
                        context.resume(throwable);
                    }
                    // I don't think we should be firing the exception on the broadcaster here
                }
            });
            //            response.closeHandler(v -> {
            //                // FIXME: notify of client closing
            //                System.err.println("Server connection closed");
            //            });
        }
    }

So it should work…

@FroMage
Copy link
Member

FroMage commented Jan 19, 2022

Sent by:

/**
 * Our job is to send initial headers for the SSE request
 */
public class SseResponseWriterHandler implements ServerRestHandler {

    public static final SseResponseWriterHandler INSTANCE = new SseResponseWriterHandler();

    public SseResponseWriterHandler() {
    }

    @Override
    public void handle(ResteasyReactiveRequestContext requestContext) throws Exception {
        requestContext.getSseEventSink().sendInitialResponse(requestContext.serverResponse());
        requestContext.suspend();
    }
}

@FroMage
Copy link
Member

FroMage commented Jan 19, 2022

So it does look like a bug. Sorry I don't have time to fix it, but I'd gladly review any PR you have to make it work again. I'm pretty surprised that the TCK doesn't test this, though. I thought it did.

@cescoffier
Copy link
Member

@FroMage so basically, we would send a response event before we get a subscription from the upstream? That sounds risky.

I can understand the "we should not wait for the first message" argument, but the method return part is puzzling me.

@FroMage
Copy link
Member

FroMage commented Jan 21, 2022

The method's caller is the upstream and registers on the stream as soon as the method returns, no?

@cescoffier
Copy link
Member

Yes, but, it must wait to receive the Subscription object before doing anything (request). You won't get a failure or anything until you receive that object (that would be a violation of the protocol).

@kdubb
Copy link
Contributor

kdubb commented Jan 26, 2022

I realize JAX-RS already requires sending the status, etc. after the request method returns but I wanted to point out something I saw mentioned in the html spec at https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events.

Clients will reconnect if the connection is closed; a client can be told to stop reconnecting using the HTTP 204 No Content response code.

So it seems 204 has a very special meaning after all and the request method should probably be the only code that returns it; and it should do so for a specific reason.

@cescoffier
Copy link
Member

So it seems 204 has a very special meaning after all and the request method should probably be the only code that returns it; and it should do so for a specific reason.

Which is typically translated by a completion event on the stream (nothing to consume). As 204 cannot be sent after the headers have been sent in the response, to implement that behavior we need to wait for the subscription event and a potential completion event (which means waiting for the first item).

That being said, all these strategies are really opinions about how SSE should be implemented. I believe we need to introduce a way to configure the desired behavior as it's impossible to reconcile all the different opinions.

@geoand
Copy link
Contributor

geoand commented Jan 26, 2022

I believe we need to introduce a way to configure the desired behavior as it's impossible to reconcile all the different opinions.

I very much agree with this

@kdubb
Copy link
Contributor

kdubb commented Jan 26, 2022

The biggest problem I have with the current implementation is it triggers a timeout. With no status until first message the client is essentially waiting for HTTP connection establishment to finish and sees the server as busy or hung. The Timeout triggers a reconnect but that defeats a lot of the purpose of SSE in the first place. It's essentially polling until it receives a message.

I believe we need to introduce a way to configure the desired behavior as it's impossible to reconcile all the different opinions.

I'm all for configurability but, and not to belabor the point, given the specs and usage experience, I just don't see how the current functionality could be allowed to stand.

@cescoffier As @FroMage eluded to, if the stream is immediately subscribed to upon completion of the resource method, then that's when the status should be returned. My point in bringing up the special meaning of the 204 was to point out that we should not be waiting around for it as normal functionality. If a server wants to "shut off" events to a specific client they can throw an exception that generates a 204 response; which IMO is more in line with the exceptionality/special meaning of a 204 response.

@kdubb
Copy link
Contributor

kdubb commented Jan 26, 2022

One more note about the timeout... using a long timeout to overcome this issue is really a non-starter. If we increase the timeout then we will not reconnect in a timely fashion when a server/connection really is hung.

@cescoffier
Copy link
Member

@cescoffier As @FroMage eluded to, if the stream is immediately subscribed to upon completion of the resource method, then that's when the status should be returned.

That's not how it works. Yes, you subscribe, no you are not subscribed. You are subscribed when the stream gives you a Subscription object. This is async.

Returning a status + headers when you subscribe and not when you are subscribed means that you will not be able to return anything else than 200.

@cescoffier
Copy link
Member

Testing a few more frameworks (outside of the Java space). It seems like most do like we are doing today.

@kdubb
Copy link
Contributor

kdubb commented Jan 26, 2022

Returning a status + headers when you subscribe and not when you are subscribed means that you will not be able to return anything else than 200.

Exactly... and RR should never send anything except 200. If a resource method needs to send something else they are free to throw an exception and return any status code they like; this is how authentication issues are handled currently.

@cescoffier
Copy link
Member

Yes and No. Authentication happens before. Here, I refer to the stream itself. If the stream sends a failure, you can't do anything (except serialize the failure into the SSE which is somewhat tricky and uncovered by any spec) and close the connection. The current approach does not cover everything, but at least immediate (by immediate I mean the first event it's still async), failure or completion, is handled.

It seems that the spec was written with a synchronous model in mind, and we will have to make it fit. I can foresee many issues once we switch to that model, but maybe it would be simpler to understand (you always get success even when we know we are going to fail)

I don't say that the current approach is right, it just picked a different angle.

So, let's implement this synchronous behavior and see how we will handle stream failures and empty streams once done.

cescoffier added a commit to cescoffier/quarkus that referenced this issue Feb 21, 2022
cescoffier added a commit to cescoffier/quarkus that referenced this issue Feb 21, 2022
Repository owner moved this from Todo to Done in Quarkus Roadmap/Planning Feb 21, 2022
@quarkus-bot quarkus-bot bot added this to the 2.8 - main milestone Feb 21, 2022
@gsmet gsmet modified the milestones: 2.8 - main, 2.7.2.Final Feb 21, 2022
gsmet pushed a commit to gsmet/quarkus that referenced this issue Feb 21, 2022
…he Publisher (so before subscription).

Fix quarkusio#22762

(cherry picked from commit 23ab5a0)
@cescoffier cescoffier self-assigned this Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants