Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RestEasy Reactive Client] Use AsyncInputStream for Posting InputStream #37308

Closed

Conversation

fredericBregier
Copy link

@fredericBregier fredericBregier commented Nov 24, 2023

InputStream was not taking into account when using sending it with ReastEasy client to a Rest service.

Why:
The previous implementation was putting all bytes in one buffer, leading either to OOME or limit on acceptable Buffer size.

Change:
Add InputStream Async support such that it will not fill the memory and respect back pressure.

Note that, to work, quarkus.http.limits.max-body-size shall be set to a correct value (ex. 0), since currently VertxInputStream has soft limit on this value. The behavior is not changed here but could if necessary since it is no more in memory (not one buffer but chunked mode with back pressure). But as setting the value to 0 or enough is working.

Add a test to check correctness (speed and memory) but this can be improved since it only checks at the end the status and/or the size of the inputstream.

@fredericBregier fredericBregier changed the title Use AsyncInputStream for Posting InputStream [RestEasy Reactive Client] Use AsyncInputStream for Posting InputStream Nov 24, 2023
@fredericBregier fredericBregier changed the title [RestEasy Reactive Client] Use AsyncInputStream for Posting InputStream Draft: [RestEasy Reactive Client] Use AsyncInputStream for Posting InputStream Nov 25, 2023
@fredericBregier fredericBregier force-pushed the feat/post_inputstream branch 2 times, most recently from b9bb289 to c0ea8f1 Compare November 25, 2023 13:46
@fredericBregier fredericBregier changed the title Draft: [RestEasy Reactive Client] Use AsyncInputStream for Posting InputStream [RestEasy Reactive Client] Use AsyncInputStream for Posting InputStream Nov 25, 2023
@fredericBregier
Copy link
Author

fredericBregier commented Nov 25, 2023

Note 2: POST is about 4 times slower than GET, but still better than nothing.
Note 3: Using Netty as low level client side, performances are 4 times better on GET, 50% better on POST. But that's another story.

POST: up to 950 MB/s, using Netty as client only up to 1250 MB/s
GET: up to 950 MB/s, using Netty as client only up to 4000 MB/s

Copy link
Member

@cescoffier cescoffier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
A quickl look didn't see anything bad, but we need to be careful.

We would need the review from @geoand and @vietj.

@cescoffier
Copy link
Member

@vietj can you have a look to the Vert.x Input Stream to Read Stream class?

This comment has been minimized.

@geoand
Copy link
Contributor

geoand commented Nov 27, 2023

Very nice. This will take some time to review, more that the few minutes I currently have on my phone 😎

@fredericBregier
Copy link
Author

Just a side note: when using direct netty client, out of vertx or quarkus, I used HttpChunkedInput with a chunkWriteHandler to get max performance, but I didn't find a way to use it within reactive client, even if under there is netty...

@geoand
Copy link
Contributor

geoand commented Dec 6, 2023

I remember @vietj saying we should use pipe instead of pump :)

@fredericBregier
Copy link
Author

fredericBregier commented Dec 12, 2023

A bit improvement by fetching 3 instead of 1 at startup (so little memory consumption) and fetching each time a buffer is consumed (this seems to enable better pipelining between consumption and network):

POST: from 950 MB/s to 1200 MB/s, compare to Netty as client only up to 3600 MB/s
GET: from 950 MB/s to 2000 MB/s, compare to Netty as client only up to 4400 MB/s

Note that increasing the startup fetching from 2 to 5 still increases the result, with still no memory impact (not noticeable):

  • POST : 1) 950 MB/s 2) 1100 MB/s 3) 1200 MB/s 4) 1250 MB/s 5) 1300 MB/s
  • GET: 1) 950 MB/s 2) 1500 MB/s 3) 2000 MB/s 4) 2250 MB/s 5) 2300 MB/s

My conviction is that 3 shall be the conservative option to get the best average performance while minimizing the memory impact (even if not noticeable). 4 might be the best choice however. (maybe an option for that ?)

@geoand
Copy link
Contributor

geoand commented Dec 14, 2023

@vietj can you have a look at this as it very intense on the Vert.x side?

Thanks

@vietj
Copy link

vietj commented Dec 21, 2023

@fredericBregier I had a quick look: as far as I understood the AsyncInputStream will always be using the VertxBlockingInput so we have the guarantee that using the input stream never blocks ?

@fredericBregier
Copy link
Author

@vietj This implementation is inspired from various projects (not mines).
Of course, I cannot be certain 100%. I tried several ways. I have an issue sometimes, never in simple POST (one client -> one server), but when chaining (client -> (server -> internal client) -> another server). I tried to find out why, but didn't get the reason yet.

However, if there is another way, I can try if you have any idea.

The issue right now is that sending an inputstream (POST) from a client leads to all bytes in memory, which leads to OOME. On the reverse way, getting InputStream (GET) is working (modulo some optimizations that I proposed too, but can be split in 2 MR).

@fredericBregier
Copy link
Author

@vietj And indeed, only in double POST (client to server to another server), it is indeed VertxBlockingInput that seems to block. But as it is under the wood, it seems difficult to bypass it.
(Note again: in single POST, no chaining, no issue)

Any idea ?

@fredericBregier
Copy link
Author

fredericBregier commented Dec 21, 2023

Sorry for the spam alike...

I tried the following changes. I added a test with concurrent client (10 threads). With previous implementation based on VertxBlockingStream, it was always in error (length not ok). With this version, 8/10 threads are ok, still 2 not. So I hope I'm not so far from a correct answer.

No blocking but early closing of the InputStream... Still investigating

(Applied on all 3 versions with small adaptations to context)

@fredericBregier
Copy link
Author

Stopping research for now: debugging gives me that it seems Http1xServerConnection.onEnd is called too early, such that there are missing chunks, but I don't get why.
Sending with 1 thread, even multiple times is OK. But when several threads in function, it turns wrong (6/10 in average), using QuarkusRestClientBuilder or native Quarkus Rest client.

@fredericBregier
Copy link
Author

@vietj I ended up with a new version which seems far more stable, both in sequential and concurrent usages.

  • The VertBlockingInput is now far less blocking, only on necessary parts.
  • AsyncInputStream is inlining the dependence from Vertx (InboundBuffer) with light modification to simplify it

@fredericBregier fredericBregier force-pushed the feat/post_inputstream branch 2 times, most recently from c1097bf to 7730d3c Compare December 27, 2023 15:19
@fredericBregier
Copy link
Author

fredericBregier commented Jan 9, 2024

@vietj Hi, I know the end and begiining of year are not the most favorable time to review ;-)
Any chance that this time my proposal is closer to your thought ?

@vietj
Copy link

vietj commented Jan 9, 2024

@fredericBregier I need to dedicate time to this, now seems more favorable indeed!

@fredericBregier
Copy link
Author

Any news? I guess next quarkus release is probably taken all available time...

InputStream was not taking into account when using sending it with
ReastEasy client to a Rest service.

Why:
The previous implementation was putting all bytes in one buffer, leading
either to OOME or limit on acceptable Buffer size.

Change:
Add InputStream Async support such that it will not fill the memory and
respect back pressure.
maxChunkSize to current MaxChunkSize, at least 8192
Workd both on sending and retrieving (server, client) an InputStream

Fetch more than 1 at startup (3 to still have a limit of memory
consumption)

Speedup:
- POST: from 900 MB/s to 1200 MB/s (almost close to Netty native Post
  at 1400 MB/s)
- GET: from 900 MB/s to 2000 MB/s (still half og Netty netive GET at
  4400 MB/s)
In particular, improve VertxBlockingInput
@fredericBregier
Copy link
Author

Hi @vietj
I just rebase the code such that to make it easier (no issue with rebase at all).

@fredericBregier
Copy link
Author

Hi @vietj
Any news? Maybe you're thinking to another way (vertx native way?) or is this proposal not ok?

@vietj
Copy link

vietj commented Mar 3, 2024

@fredericBregier sorry no time for that, I am very overloaded with the recent releases we have done I will try to have a look soon (hint: this code is quite complex)

@fredericBregier
Copy link
Author

I understand. I know the recent releases were huge!! No problem

@fredericBregier
Copy link
Author

Hi @vietj
Just a short notice on last Quarkus "main" version on ClientSendRequerstHandler with another implementation (far more simpler than mine): I've tried this implementation and it seems to fit. I've got some new errors in my tests but not related (headers issue). I need to investigate.
I think there is still some space for optimization on VertxXXXInputStream implementation, but not that mandatory yet.

@fredericBregier
Copy link
Author

fredericBregier commented Mar 16, 2024

@vietj OK, I found out that the next commit was fixing the header issue (I was on the commit level, so missing this one). I have to test however.

Do you now when those commits will be put in an official release such that we can close easily this MR?

@geoand
Copy link
Contributor

geoand commented Mar 16, 2024

Hi @fredericBregier

Those commits are already part of Quarkus 3.8.3.

I am sorry that I had not realized my PR was essentially doing the same thing as yours.

@fredericBregier
Copy link
Author

@geoand No issue : you've found another way far more readable! That's perfect ! Thanks a lot!
The goal was to be able to handle InputStream not fully in memory (or on disk) since container cannot handle such things for huge stream.

I check again using last commits from main (partially imported to not get everything). This works fine!

I think we can close this MR then, and I will probably later on propose some optimizations on VertxXXXInputStream, and you''l see if it is interesting.

@geoand
Copy link
Contributor

geoand commented Mar 16, 2024

Glad to hear it works for you!

Improvements are absolutely welcome 😁

@geoand geoand closed this Mar 16, 2024
@quarkus-bot quarkus-bot bot added the triage/invalid This doesn't seem right label Mar 16, 2024
@fredericBregier
Copy link
Author

fredericBregier commented Mar 23, 2024

@geoand The commit #39282 is not in the 3.8.3 release which turns into bad header mapping (no header at all).
I believe a quick fix shall be done (3.8.4) since it breaks inputstream handling (I believe however this does not touch other cases)

@geoand
Copy link
Contributor

geoand commented Mar 23, 2024

You are right, that was my fault for not properly labeling the PR.

It should be part of the next 3.8 release

@fredericBregier
Copy link
Author

No problem. As soon as I saw it, I sent this message in order to fix it in next release.
Thank you again !

@geoand
Copy link
Contributor

geoand commented Mar 23, 2024

Thank you for raising the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants