Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test-http-server-request-timeouts-mixed #43465

Open
F3n67u opened this issue Jun 18, 2022 · 15 comments · Fixed by #45722
Open

Investigate flaky test-http-server-request-timeouts-mixed #43465

F3n67u opened this issue Jun 18, 2022 · 15 comments · Fixed by #45722
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@F3n67u
Copy link
Member

F3n67u commented Jun 18, 2022

Test

test-http-server-request-timeouts-mixed

Platform

macos

Console output

not ok 1173 parallel/test-http-server-request-timeouts-mixed
  ---
  duration_ms: 3.244
  severity: fail
  exitcode: 1
  stack: |-
    node:assert:399
        throw err;
        ^
    
    AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
    
      assert(request2.completed)
    
        at Timeout._onTimeout (/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/test/parallel/test-http-server-request-timeouts-mixed.js:106:5)
        at listOnTimeout (node:internal/timers:564:17)
        at process.processTimers (node:internal/timers:507:7) {
      generatedMessage: true,
      code: 'ERR_ASSERTION',
      actual: false,
      expected: true,
      operator: '=='
    }
    
    Node.js v19.0.0-pre
  ...

Build links

Failed PR

10 (nodejs/node#43329, nodejs/node#43380, nodejs/node#43366, nodejs/node#43455, nodejs/node#43190, nodejs/node#43176, nodejs/node#43374, nodejs/node#43363, nodejs/node#43417, nodejs/node#43216)

Appeared

test-orka-macos11-x64-1, test-nearform-macos10.15-x64-3, test-orka-macos10.15-x64-2, test-nearform-macos10.15-x64-1, test-orka-macos11-x64-2

First CI

https://ci.nodejs.org/job/node-test-pull-request/44587/

Last CI

https://ci.nodejs.org/job/node-test-pull-request/44664/

Example

https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx11-x64/45608/console

Additional information

Build links info is from CI Reliability 2022-06-18

This test was introduced on 2022-05-03: #42893

This test starts flaky from 2022-05-04: nodejs/reliability#271

Query all flaky dates: https://github.com/nodejs/reliability/issues?q=is%3Aissue+is%3Aopen+test-http-server-request-timeouts-mixed

cc @ShogunPanda(Test author) @lpinca(Reviewer, I saw you mention the flaky possibility in pr comment)

@F3n67u F3n67u added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Jun 18, 2022
@lpinca
Copy link
Member

lpinca commented Jun 18, 2022

It seems this only occurs on macOS but I am not able to reproduce locally on macOS 12.4.

@mcollina
Copy link
Member

@ShogunPanda could d you take a look?

@ShogunPanda
Copy link
Contributor

Yup, I will take a look on Monday!

@F3n67u
Copy link
Member Author

F3n67u commented Jun 18, 2022

It seems this only occurs on macOS but I am not able to reproduce locally on macOS 12.4.

Yes, I also tried to reproduce on my local MacOS 12.4 but never reproduced.

ShogunPanda pushed a commit that referenced this issue Jun 28, 2022
targos pushed a commit that referenced this issue Jul 12, 2022
targos pushed a commit that referenced this issue Jul 31, 2022
@nicksia-vgw
Copy link
Contributor

nicksia-vgw commented Aug 5, 2022

I was able to repro on Mac 12.4:

=== release test-http-server-request-timeouts-mixed ===                       
Path: parallel/test-http-server-request-timeouts-mixed
node:assert:400
    throw err;
    ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(request2.completed)

    at Timeout._onTimeout (/Users/nicksia/Documents/git/node/test/parallel/test-http-server-request-timeouts-mixed.js:107:5)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

Let me know if there's anything I can do to help, even if it's just being a lab rat :)

@ShogunPanda
Copy link
Contributor

Thanks sir! Do you mind sharing which machine (hardware I mean) were you running on and all the relevant environment information (like if you were on battery or if your machine was overloaded and so forth)

@nicksia-vgw
Copy link
Contributor

nicksia-vgw commented Aug 5, 2022

Sure thing - here's a screenshot of my system details:
image

My machine was plugged into the wall.

Test was running in CLion's integrated terminal.

I'd say "low-medium" utilisation, had Postgres running in a Docker image and a IDE/Browser open (about 20% CPU usage and 20GB memory used).

Update: Fails even with request2 isolated.

@nicksia-vgw
Copy link
Contributor

nicksia-vgw commented Aug 5, 2022

Here's some awfully crude dumps (sorry!), printing out the value of request2.completed every 100ms with setInterval.

Failed Run:

@[101ms] request2.completed: undefined
@[201ms] request2.completed: undefined
@[303ms] request2.completed: undefined
request2 client created.
request2.client.write()
@[413ms] request2.completed: false
<snip>
@[2840ms] request2.completed: false
node:assert:400
    throw err;
    ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(request2.completed)

    at Timeout._onTimeout (/Users/nicksia/Documents/git/node/test/parallel/test-http-server-request-timeouts-mixed.js:81:5)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

Node.js v19.0.0-pre

Passing Run

@[102ms] request2.completed: undefined
@[203ms] request2.completed: undefined
@[304ms] request2.completed: undefined
request2 client created.
request2.client.write()
@[412ms] request2.completed: false
<snip>
@[2439ms] request2.completed: false
@[2540ms] request2.completed: true
<snip>
@[6481ms] request2.completed: true
Closing server.

My passing runs are close to 2500ms after client is created, when the headersTimeout * 1.2 + connectionsCheckingInterval on my platform is 2900ms.

Could it be that the check is just too early on slow/burdened systems?

Update: I'm getting a bunch of runs around the 3000ms mark when I remove assert(request2.completed):

@[3039ms] request2.completed: true

@nicksia-vgw
Copy link
Contributor

The times don't quite match up, which is causing this issue.

headersTimeout: 2000ms
requestTimeout: 4000ms
connectionsCheckingInterval: 500ms

Request 2 is started at headersTimeout * 0.2, or 400ms.
Headers time out 2000ms after that.
Completion of Request 2 is checked at at headersTimeout * 1.2 + connectionsCheckingInterval, or 2900ms.

Timeline
@[400ms]: Request 2 client created and first write sent.
@[2400ms]: Headers are timed out.
@[2900ms]: Request 2 completion is checked (which is intended to depend on Request 2 expiring via headersTimeout, as seen in the comment - assert(request2.response.startsWith(responseTimeout)); // It is expired due to headersTimeout.

The problem is that the connectionsCheckingInterval is slightly too high. The connection checking interval could tick just after 2400ms, and then just after 2900ms.

Example

@[2424ms] request2.completed: false
checking connection timeout
@[2525ms] request2.completed: false
@[2625ms] request2.completed: false
@[2727ms] request2.completed: false
@[2828ms] request2.completed: false
node:assert:400
    throw err;
    ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(request2.completed)

    at Timeout._onTimeout (/Users/nicksia/Documents/git/node/test/parallel/test-http-server-request-timeouts-mixed.js:108:5)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

The fix is to ensure connection checking interval is less than 500ms, and to decouple the check timeout from connection checking interval.

I'll put up a fix for this shortly.

nicksia-vgw added a commit to nicksia-vgw/node that referenced this issue Aug 8, 2022
parallel/http-server-request-timeouts-mixed test was sometimes failing due to insufficient tolerance between the connection timeout checking interval, and the expected timeout specified in the test. 

The checking interval was 500ms, and the request was checked for timeout exactly 500ms after the request was expected to timeout. This led to a timing condition where the next check would occur slightly after the request was expected to timeout.

fixes: nodejs#43465
nicksia-vgw added a commit to nicksia-vgw/node that referenced this issue Aug 8, 2022
parallel/http-server-request-timeouts-mixed test was sometimes failing due to insufficient tolerance between the connection timeout checking interval, and the expected timeout specified in the test. 

The checking interval was 500ms, and the request was checked for timeout exactly 500ms after the request was expected to timeout. This led to a timing condition where the next check would occur slightly after the request was expected to timeout.

This change makes the checking interval more frequent, and decouples the timeout for the check from the checking interval, otherwise the issue would persist.

fixes: nodejs#43465
nicksia-vgw added a commit to nicksia-vgw/node that referenced this issue Aug 8, 2022
parallel/http-server-request-timeouts-mixed test was sometimes failing
due to insufficient tolerance between the connection timeout checking 
interval, and the expected timeout specified in the test.

The checking interval was 500ms, and the request was checked for 
timeout exactly 500ms after the request was expected to timeout. 
This led to a timing condition where the next check would occur 
slightly after the request was expected to timeout.

This change makes the checking interval more frequent, and decouples
the timeout for the check from the checking interval, otherwise the 
issue would persist.

fixes: nodejs#43465
nicksia-vgw added a commit to nicksia-vgw/node that referenced this issue Aug 8, 2022
parallel/http-server-request-timeouts-mixed test was sometimes failing
due to insufficient tolerance between the connection timeout checking
interval, and the expected timeout specified in the test.

This change makes the checking interval more frequent, and decouples
the timeout for the check from the checking interval.

fixes: nodejs#43465
guangwong pushed a commit to noslate-project/node that referenced this issue Oct 10, 2022
@anonrig
Copy link
Member

anonrig commented Oct 18, 2023

@richardlau
Copy link
Member

Failed on Ubuntu 22 - https://ci.nodejs.org/job/node-test-commit-linuxone/40388/

No, this one failed on rhel8-s390x.

anonrig added a commit to anonrig/node that referenced this issue Oct 18, 2023
anonrig added a commit to anonrig/node that referenced this issue Oct 18, 2023
nodejs-github-bot pushed a commit that referenced this issue Oct 18, 2023
Ref: #43465
PR-URL: #50227
Refs: #43465
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
targos pushed a commit that referenced this issue Oct 23, 2023
Ref: #43465
PR-URL: #50227
Refs: #43465
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
alexfernandez pushed a commit to alexfernandez/node that referenced this issue Nov 1, 2023
Ref: nodejs#43465
PR-URL: nodejs#50227
Refs: nodejs#43465
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
targos pushed a commit that referenced this issue Nov 11, 2023
Ref: #43465
PR-URL: #50227
Refs: #43465
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
@mhdawson
Copy link
Member

Seen this several times on osx - https://ci.nodejs.org/job/node-test-commit-osx-arm/nodes=osx11/14659/ as an example

targos pushed a commit that referenced this issue Nov 27, 2023
Ref: #43465
PR-URL: #50227
Refs: #43465
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
sercher added a commit to sercher/graaljs that referenced this issue Apr 25, 2024
Ref: nodejs/node#43465
PR-URL: nodejs/node#50227
Refs: nodejs/node#43465
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
sercher added a commit to sercher/graaljs that referenced this issue Apr 25, 2024
Ref: nodejs/node#43465
PR-URL: nodejs/node#50227
Refs: nodejs/node#43465
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
aduh95 pushed a commit that referenced this issue May 11, 2024
PR-URL: #45722
Fixes: #43465
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Paolo Insogna <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
targos pushed a commit that referenced this issue May 12, 2024
PR-URL: #45722
Fixes: #43465
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Paolo Insogna <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
marco-ippolito pushed a commit that referenced this issue Jun 17, 2024
PR-URL: #45722
Fixes: #43465
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Paolo Insogna <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
marco-ippolito pushed a commit that referenced this issue Jun 17, 2024
PR-URL: #45722
Fixes: #43465
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Paolo Insogna <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
marco-ippolito pushed a commit that referenced this issue Jun 17, 2024
PR-URL: #45722
Fixes: #43465
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Paolo Insogna <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
sophonieb pushed a commit to sophonieb/node that referenced this issue Jun 20, 2024
PR-URL: nodejs#45722
Fixes: nodejs#43465
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Paolo Insogna <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
bmeck pushed a commit to bmeck/node that referenced this issue Jun 22, 2024
PR-URL: nodejs#45722
Fixes: nodejs#43465
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Paolo Insogna <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
@lpinca lpinca reopened this Sep 7, 2024
@szmarczak
Copy link
Member

szmarczak commented Oct 17, 2024

Is it possible to run wireshark (or any other pcap generating network capturing software) on CI for that test and export the pcap file for debugging? Then if this test fails we can inspect what actually went wrong. I have never used Jenkins, does it have some sort of artifacts like GitHub Actions?

I wonder if it actually ever failed on GitHub Actions.

@szmarczak
Copy link
Member

@nicksia-vgw If you're still able to reproduce the issue, could you send me a pcap file (you can use Wireshark)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants