-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine extended flow messages into a single composite message #138
Comments
Thanks for bringing this to our attention. Admittedly, we haven't had the amount of time to look into potential performance issues as we would've liked to. Is there a chance you prepare the example as JMH benchmark so that we can take a look? @mp911de's microbenchmark runner might be helpful with that. |
I placed a benchmark code using JMH in dbc-benchmark. As I wrote earlier, my code uses r2dbc-postgresql, r2dbc-client, r2dbc-pool. First I'd like to find out which modules cause the performance issue. |
Thanks for preparing this. I'll have a look early next week. In general, I usually just run the benchmarks and attach a profiler (like YourKit) to see where time is spent. |
Have you tried it without using the connection pool? HikariCP does some things with the connections it manages by putting some items thread local storage and only going across boundaries to steal connections from others if it needs it. Since you are doing all the work in a single thread, it's probably reusing the same connection over and over again. I would do a test without r2dbc-pool and hikaricp at all, and just do it on a single connection to see if it's even the database driver at all introducing overhead. |
@skroll |
That sounds like you're doing parallel work then. If you are not using a connection pool, there should only be 1 connection active. Were you creating a connection each time? |
I used following code dbc-benchmark/without-connection-pool. |
@ksky8864 the code looks okay. Benchmarking without pool usage includes the expensive connection creation phase. |
Fired up wireshark to take a look at the packets underneath the hood. When doing a select statement, with a single parameter with the JDBC driver, the driver does the Parse, Bind, Describe, Execute actions in the same message which gets sent in a single TCP packet. The R2DBC driver does the following: I'm guessing this is a part of where you are seeing some differences in performance. |
Thanks a lot for having a look @skroll. We could create composite messages to combine the extended flow into a single package. These messages are issued in |
I updated the ticket title to reflect the actual goal. |
For completions sake uploaded wireshark dumps of the same query being performed through jdbc and the r2dbc-postgresql driver. It seems like the jdbc driver just sends Parse/Bind/Describe/Execute/Sync while the r2dbc-postgresql implementation sends Parse/Describe/Sync/Bind/Describe/Execute/Close/Sync. Would combining the extended flow messages into a single composite message reduce the number of messages needed to be sent? I'm guessing you don't need a named portal when using a composite message and doing a simple statement with parameter binding. |
I'm not terribly familiar with the extended flow yet and I'm not sure why we additionally Describe+Sync and Describe before execute. Probably we could improve on that end. A composite message would combine buffers instead of putting each message into its own packet and it would flush only once after sending the message to the transport. Internally, the message structure would remain the same and the database server shouldn't notice any difference. |
OK, so I figured out why there's double describe/sync. The IndefiniteStatementCache is calling ExtendedQueryMessageFlow.parse, which sends Parse, Describe, Sync. Then the actual ExtendedQueryMessageFlow.execute is called when the query is invoked, which does Bind, Describe, Execute, Close, Sync. It only happens the first time for a query until it is cached. |
I did some experiments with pre-allocated messages and streamlined flushing behavior. By reducing the number of flushes we can roughly gain about 10% throughput in the extended query flow: Baseline:
Composite message:
There is however still a significant gap between the extended flow in JDBC and the extended flow in R2DBC. Caching the Simple flow figures are quite close so we might want to focus on the gap in the extended flow and why this is. Maybe some more knowledgeable is able to help us out. I pushed the code changes the |
Could be related to naming every portal/prepared statement but still calling Parse, which the JDBC driver does not do. IIRC, the JDBC driver only names prepared statements that are called X times in the same session (I think its 5 by default). Afterwards, once it has named the statement, it no longer uses the Parse or Describe messages. It simply calls Bind, Execute, Sync. I just tested this using wireshark, and the behavior matches what I expected. I have a feeling that naming a statement over and over again, while ending with "Close" is incurring overhead. Postgres has a concept of "unnamed statements" which don't require the "Close" operation. I think that statements should only be named if they are re-used, and they shouldn't be closed each time. If the driver doesn't intend on persisting named statements on the server side, they should never be named. Also, from the postgres documentation:
|
I noticed two effects during benchmarking:
So that I end up with 13515 vs. 17354 queries/sec. |
@mp911de I think we can use NettyOutbound.sendGroups instead of those composite messages. It works just as you want and we already submit our messages grouped in publishers.
|
Thanks for having a look. That looks good. Probably we need another The performance gains from my branch are okay but that's not a terribly huge increase in throughout. Especially the |
I merged parts of the branch into
Let's have another benchmark round with |
…CP packet Extended flow parse+flush,execute+flush, execute+close+sync, and bind+describe messages are now sent as single TCP packet to improve TCP efficiency. [#138]
…CP packet Extended flow parse+flush,execute+flush, execute+close+sync, and bind+describe messages are now sent as single TCP packet to improve TCP efficiency. [#138]
Related to #334, we should optimize more aggressively for the default case to combine extended flow packets if there's a single parameter binding set. |
|
I benchmarked r2dbc-postgresql and JDBC driver for PostgreSQL to find out if r2dbc is suitable for online transaction processing, but the result of r2dbc was not as good as that of JDBC.
these are the benchmark test conditions:
parts of test code that use r2dbc are:
The test result is:
The text was updated successfully, but these errors were encountered: