-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional delays caused by TCP NoDelay config option defaulting to false #334
Comments
Between 0.8.3 and 0.8.4 we changed a few deadlocks and ensured proper backpressure propagation. Do you have an idea what's happening in the huge gap between the two blocks of queries? Can you take a thread dump and optionally add a bit of logging to see where the application gets stuck? Other than that, we need to debug/profile the driver to find out what's going on. |
I don't know what's happening in the gap. there are nothing, the code look like that :
I will try to take a thread dump. |
No thread dump for now. But I have create a very simple endpoint (only one insertion) to see the issue. The exemple is available on this project https://github.com/deblockt/r2dbc-issue-reproducer/tree/perf-issue-0.8.3 for version 0.8.3 and https://github.com/deblockt/r2dbc-issue-reproducer/tree/perf-issue-0.8.4 for versoin 0.8.4 I will try to do a thread dump in the evening |
@mp911de Thread dump and trace sample have been added on sample project, you can find it:
|
Thanks a lot. What immediately becomes visible is the gap between the log outputs of several 10's of ms:
|
Yes, this 10ms appear only on version > 0.8.4.RELEASE. this.insert(id)
.doOnNext(it -> log.info("after the insertion"))
public Mono<String> insert(String id) {
return databaseClient.
insert()
.into("applied_message")
.value("id", id)
.then()
.thenReturn(id);
} |
That's weird, locally I'm not able to reproduce the issue:
I'm not sure whether/how much Spring Cloud Sleuth and others play into that. I'd suggest to reduce the test case to a minimum (e.g. removing sleuth that decorates a lot of publishers) and optionally using R2DBC SPI directly to pin the issue down. |
Hi, I have remove some of useless deps on the test project, nothing change localy. I have try to test some scenario to see if the issue come from r2dbc, I have try: The perf results are in this table:
It's strange that you didn't have the same results as me; I will try to run the API on a docker to see if you can reproduce. I use this code to test with the R2DBC SPI: package r2dbc.reproducer;
import io.r2dbc.spi.Result;
import lombok.extern.slf4j.Slf4j;
import org.springframework.data.r2dbc.core.ConnectionAccessor;
import org.springframework.data.r2dbc.core.DatabaseClient;
import org.springframework.http.ResponseEntity;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Mono;
import java.time.Duration;
import java.util.UUID;
@Slf4j
@RestController
public class TestController {
private final ConnectionAccessor databaseClient;
public TestController(DatabaseClient client) {
this.databaseClient = (ConnectionAccessor) client;
}
@Transactional
@GetMapping("/test")
public Mono<ResponseEntity<String>> simulate() {
final var id = UUID.randomUUID().toString();
return
Mono.defer(() -> {log.info("before insert data (R2DBC 0.8.4)"); return Mono.empty();})
.then(this.insert(id))
.doOnNext(it -> log.info("after the insertion"))
.map(r -> ResponseEntity.ok("ok " + id))
.doOnNext(it -> log.info("after creating the response entity"));
}
public Mono<String> insert(String id) {
return databaseClient.inConnection(con -> {
final var query = con.createStatement("insert into applied_message(id) values($1)")
.bind(0, id)
.execute();
return Mono.from(query)
.flatMap(result -> Mono.from(result.getRowsUpdated()));
})
.thenReturn(id);
}
} |
I set the log level to trace, and I can see that:
Some time this step can take 20ms (not all the time):
|
There's a lot of additional background detail available from spring-projects/spring-data-r2dbc#477. The recorded packet capture shows that the |
@cambierr I pushed a change where several command flow sequences are combined into single TCP packets (e.g. BIND + Describe). That comes closer to what the JDBC driver does, although some bits (combining bind+describe with execute/close/sync) require a bit more effort to avoid individual TCP packets where possible. Care to retest against |
just tested and.... no change at all: I graphed both results and they are exactly the same :/ |
Do you mind joining Postgres slack at https://postgres-slack.herokuapp.com/? I think we need someone who is able to help us from the Postgres core team, especially the delayed |
Just joined and saw you posted already :) |
FTR: Increasing demand (ensuring there's always demand in The issue can be reproduced by using a single connection and R2DBC Postgres directly (R2DBC Pool and Spring Data R2DBC don't affect the issue). |
Here is a network capture of oth 0.8.3 and 0.8.5 benchmarks |
simplest reproducer available at https://github.com/cambierr/r2dbc-perf-issue |
Update: The reason the issue was not reproducible is that there's a behavioral difference between MacOS and Linux. Can reproduce the issue on Ubuntu 20, Postgres 12, Java 8. |
The commit introducing this behavior is 233f16e (TCP KeepAlive and TCP NoDelay) |
Yeah, enabling TCP NoDelay lets the issue disappear. |
FWIW, https://www.extrahop.com/company/blog/2016/tcp-nodelay-nagle-quickack-best-practices describes the delayed |
PGJDBC has TCP NoDelay enabled so it makes sense to flip the default here as well. |
We're going to flip the configuration default and enable TCP NoDelay by default. A workaround is enabling TCP NoDelay on affected systems. |
It seem that rensponse times is better. but always slower than with 0.8.3 Version (there always have gap between requests). Thanks. |
Thanks @deblockt. Feel free to file a new ticket once you can provide a reproducer since this one has quite some comments regarding delayed TCP ACKs. |
Bug Report
Versions
Current Behavior
Upgrading from
0.8.3.REALASE
to0.8.4.RELEASE
ofr2dbc-postgresql
causes performance issues.On our api we can see *2 of response time.
Exemple of zipkin trace of api.
With 0.8.3.RELEASE
With 0.8.4.RELEASE
you can see that, with
0.8.4.RELEASE
, SQL queries are executed faster, but there are gaps between the different executions.Table schema
Steps to reproduce
I have created a project to see this issue.
0.8.3.RELEASE: https://github.com/deblockt/r2dbc-issue-reproducer/tree/perf-issue-0.8.3
0.8.4.RELEASE: https://github.com/deblockt/r2dbc-issue-reproducer/tree/perf-issue-0.8.4
This project perform some insertion on database (table with one column).
On my laptop using the 0.8.3.RELEASE version this script takes 200ms, with 0.8.4.RELEASE it takes 500ms.
Expected behavior/code
have no performance issue with the 0.8.4.RELEASE version
Possible Solution
Additional context
The text was updated successfully, but these errors were encountered: