Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional delays caused by TCP NoDelay config option defaulting to false #334

Closed
deblockt opened this issue Oct 5, 2020 · 25 comments
Closed
Labels
type: regression A regression from a previous release
Milestone

Comments

@deblockt
Copy link

deblockt commented Oct 5, 2020

Bug Report

Versions

  • Driver: 0.8.3.RELEASE - 0.8.4.RELEASE
  • Database: postgres
  • Java: 11
  • OS: Ubuntu

Current Behavior

Upgrading from 0.8.3.REALASE to 0.8.4.RELEASE of r2dbc-postgresql causes performance issues.
On our api we can see *2 of response time.

Exemple of zipkin trace of api.
With 0.8.3.RELEASE

Capture d’écran de 2020-10-05 15-29-30

With 0.8.4.RELEASE

Capture d’écran de 2020-10-05 15-29-45

you can see that, with 0.8.4.RELEASE, SQL queries are executed faster, but there are gaps between the different executions.

Table schema

Steps to reproduce

I have created a project to see this issue.

0.8.3.RELEASE: https://github.com/deblockt/r2dbc-issue-reproducer/tree/perf-issue-0.8.3
0.8.4.RELEASE: https://github.com/deblockt/r2dbc-issue-reproducer/tree/perf-issue-0.8.4

This project perform some insertion on database (table with one column).
On my laptop using the 0.8.3.RELEASE version this script takes 200ms, with 0.8.4.RELEASE it takes 500ms.

Expected behavior/code

have no performance issue with the 0.8.4.RELEASE version

Possible Solution

Additional context

@deblockt deblockt added the status: waiting-for-triage An issue we've not yet triaged label Oct 5, 2020
@mp911de
Copy link
Collaborator

mp911de commented Oct 5, 2020

Between 0.8.3 and 0.8.4 we changed a few deadlocks and ensured proper backpressure propagation. Do you have an idea what's happening in the huge gap between the two blocks of queries? Can you take a thread dump and optionally add a bit of logging to see where the application gets stuck?

Other than that, we need to debug/profile the driver to find out what's going on.

@deblockt
Copy link
Author

deblockt commented Oct 5, 2020

I don't know what's happening in the gap. there are nothing, the code look like that :

firstBlock()
 .flatMap(result -> secondBlock())

I will try to take a thread dump.

@deblockt
Copy link
Author

deblockt commented Oct 6, 2020

No thread dump for now. But I have create a very simple endpoint (only one insertion) to see the issue.

The exemple is available on this project https://github.com/deblockt/r2dbc-issue-reproducer/tree/perf-issue-0.8.3 for version 0.8.3 and https://github.com/deblockt/r2dbc-issue-reproducer/tree/perf-issue-0.8.4 for versoin 0.8.4

I will try to do a thread dump in the evening

@mp911de
Copy link
Collaborator

mp911de commented Oct 7, 2020

Thanks a lot. What immediately becomes visible is the gap between the log outputs of several 10's of ms:

2020-10-07 10:50:17.674 DEBUG [-,93cf6dd2e00c1412,93cf6dd2e00c1412,true] 521405 --- [tor-tcp-epoll-5] io.r2dbc.postgresql.QUERY                : Executing query: INSERT INTO applied_message (id) VALUES ($1) RETURNING *
2020-10-07 10:50:17.715  INFO [-,cde0e5665f4fe46d,cde0e5665f4fe46d,true] 521405 --- [tor-tcp-epoll-5] r2dbc.reproducer.TestController          : after the insertion

@deblockt
Copy link
Author

deblockt commented Oct 7, 2020

Yes, this 10ms appear only on version > 0.8.4.RELEASE.
but nothing is done between the log and the query:

this.insert(id)
      .doOnNext(it -> log.info("after the insertion"))

 public Mono<String> insert(String id) {
        return databaseClient.
                insert()
                .into("applied_message")
                .value("id", id)
                .then()
                .thenReturn(id);
    }

@mp911de
Copy link
Collaborator

mp911de commented Oct 8, 2020

That's weird, locally I'm not able to reproduce the issue:

2020-10-08 11:17:51.123 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [ctor-http-nio-3] o.s.d.r.c.R2dbcTransactionManager        : Creating new transaction with name [r2dbc.reproducer.TestController.simulate]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT
2020-10-08 11:17:51.123 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [ctor-http-nio-3] o.s.d.r.c.R2dbcTransactionManager        : Acquired Connection [MonoLift] for R2DBC transaction
2020-10-08 11:17:51.123 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [ctor-http-nio-3] o.s.d.r.c.R2dbcTransactionManager        : Switching R2DBC Connection [PooledConnection-proxy [PooledConnection[PostgresqlConnection{client=io.r2dbc.postgresql.client.ReactorNettyClient@371f0b2a, codecs=io.r2dbc.postgresql.codec.DefaultCodecs@47cc9ad9}]]] to manual commit
2020-10-08 11:17:51.123 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [ctor-http-nio-3] io.r2dbc.postgresql.QUERY                : Executing query: BEGIN
2020-10-08 11:17:51.124  INFO [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] r2dbc.reproducer.TestController          : before insert data
2020-10-08 11:17:51.124 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] o.s.d.r2dbc.core.DefaultDatabaseClient   : Executing SQL statement [INSERT INTO applied_message (id) VALUES ($1)]
2020-10-08 11:17:51.124 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] io.r2dbc.postgresql.QUERY                : Executing query: INSERT INTO applied_message (id) VALUES ($1) RETURNING *
2020-10-08 11:17:51.126  INFO [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] r2dbc.reproducer.TestController          : after the insertion
2020-10-08 11:17:51.126  INFO [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] r2dbc.reproducer.TestController          : after creating the response entity
2020-10-08 11:17:51.126 TRACE [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] o.s.d.r.c.R2dbcTransactionManager        : Triggering beforeCommit synchronization
2020-10-08 11:17:51.126 TRACE [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] o.s.d.r.c.R2dbcTransactionManager        : Triggering beforeCompletion synchronization
2020-10-08 11:17:51.126 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] o.s.d.r.c.R2dbcTransactionManager        : Initiating transaction commit
2020-10-08 11:17:51.126 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] o.s.d.r.c.R2dbcTransactionManager        : Committing R2DBC transaction on Connection [PooledConnection-proxy [PooledConnection[PostgresqlConnection{client=io.r2dbc.postgresql.client.ReactorNettyClient@371f0b2a, codecs=io.r2dbc.postgresql.codec.DefaultCodecs@47cc9ad9}]]]
2020-10-08 11:17:51.126 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] io.r2dbc.postgresql.QUERY                : Executing query: COMMIT
2020-10-08 11:17:51.126 TRACE [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] o.s.d.r.c.R2dbcTransactionManager        : Triggering afterCommit synchronization
2020-10-08 11:17:51.126 TRACE [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] o.s.d.r.c.R2dbcTransactionManager        : Triggering afterCompletion synchronization
2020-10-08 11:17:51.127 DEBUG [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [actor-tcp-nio-2] o.s.d.r.c.R2dbcTransactionManager        : Releasing R2DBC Connection [PooledConnection-proxy [PooledConnection[PostgresqlConnection{client=io.r2dbc.postgresql.client.ReactorNettyClient@371f0b2a, codecs=io.r2dbc.postgresql.codec.DefaultCodecs@47cc9ad9}]]] after transaction
2020-10-08 11:17:51.127  INFO [-,d2c3ca9a1d73d4ce,d2c3ca9a1d73d4ce,true] 67184 --- [ctor-http-nio-3] r.reproducer.StartAndEndLogWebFilter     : INTERNAL end request to /test with status code 200 OK. duration: 4ms

I'm not sure whether/how much Spring Cloud Sleuth and others play into that. I'd suggest to reduce the test case to a minimum (e.g. removing sleuth that decorates a lot of publishers) and optionally using R2DBC SPI directly to pin the issue down.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Oct 8, 2020
@deblockt
Copy link
Author

deblockt commented Oct 9, 2020

Hi, I have remove some of useless deps on the test project, nothing change localy.

I have try to test some scenario to see if the issue come from r2dbc, I have try:
- remove the @Transactional and replace the request by Mono.delay of 5ms
- keep @Transactional and replace the request by Mono.delay of 5ms
- use the @Transactional and the request
- use the @Transactional and use r2dbc SPI to run the request
- remove the @Transactional and keep the request

The perf results are in this table:

usecase 0.8.3 0.8.4
transaction + request avg: 4ms, min: 2ms, max:216ms avg: 31ms, min: 4ms, max:217ms
request without @transactional avg: 3ms, min: 1ms, max: 217ms avg: 21ms, min: 2ms, max:216ms
using R2DBC spi and @transactional with ConnectionAccessor avg: 4ms, min: 2ms, max: 35ms avg: 31ms, min: 4ms, max:101ms
transaction + Mono.delay 5ms avg: 9ms, min: 6ms, max: 217ms avg: 9ms, min: 6ms, max: 217ms
Mono.delay 5ms avg: 7ms, min: 5ms, max: 133ms avg: 7ms, min: 5ms, max:116ms

It's strange that you didn't have the same results as me; I will try to run the API on a docker to see if you can reproduce.

I use this code to test with the R2DBC SPI:

package r2dbc.reproducer;

import io.r2dbc.spi.Result;
import lombok.extern.slf4j.Slf4j;
import org.springframework.data.r2dbc.core.ConnectionAccessor;
import org.springframework.data.r2dbc.core.DatabaseClient;
import org.springframework.http.ResponseEntity;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Mono;

import java.time.Duration;
import java.util.UUID;

@Slf4j
@RestController
public class TestController {
    private final ConnectionAccessor databaseClient;

    public TestController(DatabaseClient client) {
        this.databaseClient = (ConnectionAccessor) client;
    }

    @Transactional
    @GetMapping("/test")
    public Mono<ResponseEntity<String>> simulate() {
        final var id = UUID.randomUUID().toString();
        return
            Mono.defer(() -> {log.info("before insert data (R2DBC 0.8.4)"); return Mono.empty();})
                .then(this.insert(id))
                .doOnNext(it -> log.info("after the insertion"))
                .map(r -> ResponseEntity.ok("ok " + id))
                .doOnNext(it -> log.info("after creating the response entity"));
    }

    public Mono<String> insert(String id) {
        return databaseClient.inConnection(con -> {
            final var query = con.createStatement("insert into applied_message(id) values($1)")
                    .bind(0, id)
                    .execute();
            return Mono.from(query)
                    .flatMap(result -> Mono.from(result.getRowsUpdated()));
        })
        .thenReturn(id);
    }
}

@deblockt
Copy link
Author

deblockt commented Oct 9, 2020

I set the log level to trace, and I can see that:


2020-10-09 09:22:38.141  INFO 233168 --- [tor-tcp-epoll-2] r2dbc.reproducer.TestController          : before insert data (R2DBC 0.8.4)
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] .s.t.r.TransactionSynchronizationManager : Retrieved value [org.springframework.data.r2dbc.connectionfactory.ConnectionHolder@e8ea1a2] for key [ConnectionPool[PostgreSQL]] bound to context [5dc7bef7-9d3a-49f0-b2ec-335a7a2e9344: r2dbc.reproducer.TestController.simulate]
2020-10-09 09:22:38.141 DEBUG 233168 --- [tor-tcp-epoll-2] io.r2dbc.postgresql.QUERY                : Executing query: insert into applied_message(id) values($1)
2020-10-09 09:22:38.141 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Request:  Bind{name='B_2119', parameterFormats=[FORMAT_TEXT], parameters=[CompositeByteBuf(ridx: 0, widx: 36, cap: 36, components=1)], resultFormats=[], source='S_0'}
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] WRITE: 64B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 42 00 00 00 3f 42 5f 32 31 31 39 00 53 5f 30 00 |B...?B_2119.S_0.|
|00000010| 00 01 00 00 00 01 00 00 00 24 66 65 30 63 66 39 |.........$fe0cf9|
|00000020| 35 31 2d 38 64 65 37 2d 34 39 38 34 2d 39 31 61 |51-8de7-4984-91a|
|00000030| 64 2d 65 65 33 66 30 32 66 35 38 39 64 32 00 00 |d-ee3f02f589d2..|
+--------+-------------------------------------------------+----------------+
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] FLUSH
2020-10-09 09:22:38.141 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Request:  Describe{name='B_2119', type=PORTAL}
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] WRITE: 13B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 44 00 00 00 0c 50 42 5f 32 31 31 39 00          |D....PB_2119.   |
+--------+-------------------------------------------------+----------------+
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] FLUSH
2020-10-09 09:22:38.141 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Request:  Execute{name='B_2119', rows=0}
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] WRITE: 16B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 45 00 00 00 0f 42 5f 32 31 31 39 00 00 00 00 00 |E....B_2119.....|
+--------+-------------------------------------------------+----------------+
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] FLUSH
2020-10-09 09:22:38.141 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Request:  Close{name='B_2119', type=PORTAL}
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] WRITE: 13B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 43 00 00 00 0c 50 42 5f 32 31 31 39 00          |C....PB_2119.   |
+--------+-------------------------------------------------+----------------+
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] FLUSH
2020-10-09 09:22:38.141 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Request:  Sync{}
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] WRITE: 5B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 53 00 00 00 04                                  |S....           |
+--------+-------------------------------------------------+----------------+
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] FLUSH
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] READ COMPLETE
2020-10-09 09:22:38.183 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] READ: 37B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 32 00 00 00 04 6e 00 00 00 04 43 00 00 00 0f 49 |2....n....C....I|
|00000010| 4e 53 45 52 54 20 30 20 31 00 33 00 00 00 04 5a |NSERT 0 1.3....Z|
|00000020| 00 00 00 05 54                                  |....T           |
+--------+-------------------------------------------------+----------------+
2020-10-09 09:22:38.183 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Response: BindComplete{}
2020-10-09 09:22:38.183 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Response: NoData{}
2020-10-09 09:22:38.183 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Response: CommandComplete{command=INSERT, rowId=0, rows=1}
2020-10-09 09:22:38.183 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Response: CloseComplete{}
2020-10-09 09:22:38.183 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Response: ReadyForQuery{transactionStatus=TRANSACTION}
2020-10-09 09:22:38.183 TRACE 233168 --- [tor-tcp-epoll-2] .s.t.r.TransactionSynchronizationManager : Retrieved value [org.springframework.data.r2dbc.connectionfactory.ConnectionHolder@e8ea1a2] for key [ConnectionPool[PostgreSQL]] bound to context [5dc7bef7-9d3a-49f0-b2ec-335a7a2e9344: r2dbc.reproducer.TestController.simulate]
2020-10-09 09:22:38.183  INFO 233168 --- [tor-tcp-epoll-2] r2dbc.reproducer.TestController          : after the insertion

Some time this step can take 20ms (not all the time):

2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] FLUSH
2020-10-09 09:22:38.141 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] READ COMPLETE
2020-10-09 09:22:38.183 TRACE 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : [id: 0x64f967a8, L:/127.0.0.1:53332 - R:localhost/127.0.0.1:5432] READ: 37B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 32 00 00 00 04 6e 00 00 00 04 43 00 00 00 0f 49 |2....n....C....I|
|00000010| 4e 53 45 52 54 20 30 20 31 00 33 00 00 00 04 5a |NSERT 0 1.3....Z|
|00000020| 00 00 00 05 54                                  |....T           |
+--------+-------------------------------------------------+----------------+
2020-10-09 09:22:38.183 DEBUG 233168 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Response: BindComplete{}

@mp911de
Copy link
Collaborator

mp911de commented Oct 14, 2020

There's a lot of additional background detail available from spring-projects/spring-data-r2dbc#477. The recorded packet capture shows that the TCP ACK for BIND has a 20-40ms delay which is weird in the first place, because that suggests that the server receive buffers are saturated or something the like.

@mp911de
Copy link
Collaborator

mp911de commented Oct 15, 2020

@cambierr I pushed a change where several command flow sequences are combined into single TCP packets (e.g. BIND + Describe). That comes closer to what the JDBC driver does, although some bits (combining bind+describe with execute/close/sync) require a bit more effort to avoid individual TCP packets where possible.

Care to retest against 0.8.6.BUILD-SNAPSHOT, available from repo.spring.io?

@cambierr
Copy link

just tested and.... no change at all: I graphed both results and they are exactly the same :/

@mp911de
Copy link
Collaborator

mp911de commented Oct 15, 2020

Do you mind joining Postgres slack at https://postgres-slack.herokuapp.com/? I think we need someone who is able to help us from the Postgres core team, especially the delayed ACK is something that draws my attention.

@cambierr
Copy link

cambierr commented Oct 15, 2020

Just joined and saw you posted already :)

@mp911de
Copy link
Collaborator

mp911de commented Oct 16, 2020

FTR: Increasing demand (ensuring there's always demand in FluxReceive to avoid ChannelConfig.setAutoRead(false)) and switching to nio (instead of epoll) does not fix the issue.

The issue can be reproduced by using a single connection and R2DBC Postgres directly (R2DBC Pool and Spring Data R2DBC don't affect the issue).

@cambierr
Copy link

Here is a network capture of oth 0.8.3 and 0.8.5 benchmarks
benchmark.zip

@cambierr
Copy link

simplest reproducer available at https://github.com/cambierr/r2dbc-perf-issue

@mp911de
Copy link
Collaborator

mp911de commented Oct 16, 2020

Update: The reason the issue was not reproducible is that there's a behavioral difference between MacOS and Linux. Can reproduce the issue on Ubuntu 20, Postgres 12, Java 8.

@mp911de
Copy link
Collaborator

mp911de commented Oct 16, 2020

The commit introducing this behavior is 233f16e (TCP KeepAlive and TCP NoDelay)

@mp911de
Copy link
Collaborator

mp911de commented Oct 16, 2020

Yeah, enabling TCP NoDelay lets the issue disappear.

@mp911de mp911de added type: regression A regression from a previous release and removed status: waiting-for-feedback We need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged labels Oct 16, 2020
@mp911de mp911de changed the title performance issue between 0.8.3.RELEASE and 0.8.4.RELEASE Occasional delays caused by TCP NoDelay config option defaulting to false Oct 16, 2020
@mp911de mp911de added this to the 0.8.6.RELEASE milestone Oct 16, 2020
@mp911de
Copy link
Collaborator

mp911de commented Oct 16, 2020

FWIW, https://www.extrahop.com/company/blog/2016/tcp-nodelay-nagle-quickack-best-practices describes the delayed ACK behavior.

@mp911de
Copy link
Collaborator

mp911de commented Oct 16, 2020

PGJDBC has TCP NoDelay enabled so it makes sense to flip the default here as well.

@mp911de
Copy link
Collaborator

mp911de commented Oct 16, 2020

We're going to flip the configuration default and enable TCP NoDelay by default. A workaround is enabling TCP NoDelay on affected systems.

mp911de added a commit that referenced this issue Oct 16, 2020
We now enable TCP NoDelay by default to avoid delayed TCP ACK's. Our flushing behavior isn't ideal yet, but combining messages allows to reduce the number of TCP packets.

[resolves #334][#138]
@deblockt
Copy link
Author

It seem that rensponse times is better. but always slower than with 0.8.3 Version (there always have gap between requests).
I will try to provide a sample and open another issue if needed.

Thanks.

@mp911de
Copy link
Collaborator

mp911de commented Oct 16, 2020

Thanks @deblockt. Feel free to file a new ticket once you can provide a reproducer since this one has quite some comments regarding delayed TCP ACKs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: regression A regression from a previous release
Projects
None yet
Development

No branches or pull requests

3 participants