Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate flaky test-timers-promisified #37226

Closed
Trott opened this issue Feb 4, 2021 · 34 comments · Fixed by #37230 · May be fixed by #46644
Closed

investigate flaky test-timers-promisified #37226

Trott opened this issue Feb 4, 2021 · 34 comments · Fixed by #37230 · May be fixed by #46644
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout.

Comments

@Trott
Copy link
Member

Trott commented Feb 4, 2021

Unfortunately, this test is unreliable and is failing in CI on Raspberry Pi devices. I also can make it fail trivially on my macOS laptop by running tools/test.py -j96 --repeat=192 test/parallel/test-timers-promisified.js. I haven't looked closely (yet) but in my experience, having a magic number delay variable like this in a timers test indicates an assumption that the host isn't so slow that (say) a 10ms second timer won't fire 50ms later. This assumption is, of course, incorrect if the machine has low CPU/memory (like a Raspberry Pi device) or if there are a lot of other things competing for the machine's resources (-j96 --repeat=192).

Originally posted by @Trott in #37153 (comment)

@Trott Trott added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Feb 4, 2021
@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

https://ci.nodejs.org/job/node-test-binary-arm-12+/9172/RUN_SUBSET=0,label=pi2-docker/console

00:21:00 not ok 570 parallel/test-timers-promisified
00:21:00   ---
00:21:00   duration_ms: 2.154
00:21:00   severity: fail
00:21:00   exitcode: 1
00:21:00   stack: |-
00:21:00     node:internal/process/promises:227
00:21:00               triggerUncaughtException(err, true /* fromPromise */);
00:21:00               ^
00:21:00     
00:21:00     AssertionError [ERR_ASSERTION]: iterations was 2 < 3
00:21:00         at /home/iojs/build/workspace/node-test-binary-arm/test/parallel/test-timers-promisified.js:370:14
00:21:00         at /home/iojs/build/workspace/node-test-binary-arm/test/common/index.js:376:15
00:21:00         at runNextTicks (node:internal/process/task_queues:59:5)
00:21:00         at processTimers (node:internal/timers:496:9) {
00:21:00       generatedMessage: false,
00:21:00       code: 'ERR_ASSERTION',
00:21:00       actual: false,
00:21:00       expected: true,
00:21:00       operator: '=='
00:21:00     }
00:21:00   ...

@Trott Trott added the timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. label Feb 4, 2021
@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

@nodejs/timers

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

Relevant failing test case:

{
// Check that if we abort when we have some callbacks left,
// we actually call them.
const controller = new AbortController();
const { signal } = controller;
const delay = 10;
let totalIterations = 0;
const timeoutLoop = runInterval(async (iterationNumber) => {
if (iterationNumber === 2) {
await setTimeout(delay * 2);
controller.abort();
}
if (iterationNumber > totalIterations) {
totalIterations = iterationNumber;
}
}, delay, signal);
timeoutLoop.catch(common.mustCall(() => {
assert.ok(totalIterations >= 3, `iterations was ${totalIterations} < 3`);
}));
}
}

@benjamingr
Copy link
Member

cc @Linkgoron

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

It's also trivial to make it fail also by changing delay to 1 on line 357. That would suggest that we should avoid the temptation to make the test more reliable by increasing that value because that masks the problem but doesn't solve it.

@benjamingr
Copy link
Member

@Trott I went on a quick call with @Linkgoron (we know each other in real life) and we agreed he'll open a PR removing this test for now in the meantime.

Then we should investigate why the error happens - I thought that eventhough timers didn't have a guarantee regarding running exactly after N milliseconds they were at least ordered. Apparently I was wrong.

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

Then we should investigate why the error happens - I thought that eventhough timers didn't have a guarantee regarding running exactly after N milliseconds they were at least ordered. Apparently I was wrong.

I think the problem might be that timers and intervals are in separate queues and ordering there is not guaranteed? I think? Or something close enough to that such that the solution may be to replace the timer on line 361 with an interval. Still investigating....

@Linkgoron
Copy link
Member

Linkgoron commented Feb 4, 2021

@Trott There is another test that is also dependant on such ordering (the last one in the test-timers-promisifed.js) is it also introducing flakiness? The one on lines 375-393 (I couldn't make it fail using tools/test.py -j96 --repeat=192 test/parallel/test-timers-promisified.js).

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

Then we should investigate why the error happens - I thought that eventhough timers didn't have a guarantee regarding running exactly after N milliseconds they were at least ordered. Apparently I was wrong.

I think the problem might be that timers and intervals are in separate queues and ordering there is not guaranteed? I think? Or something close enough to that such that the solution may be to replace the timer on line 361 with an interval. Still investigating....

Hmm, the intervals are firing out of order, and I do think that's a bug?

Here's my modified version of the test with some really crude debugging going on. Using console.log() will introduce delays/synchronicit/something that makes it harder to cause the test to fail. So I use a foo variable to collect information instead and then print it out when everything is done.

{
    // Check that if we abort when we have some callbacks left,
    // we actually call them.
    const controller = new AbortController();
    const { signal } = controller;
    const delay = 1;
    let totalIterations = 0;
    let foo = '';
    const timeoutLoop = runInterval(async (iterationNumber) => {
      foo += `start ${iterationNumber}\n`;
      if (iterationNumber === 2) {
        foo += 'awaiting\n';
        await setTimeout(delay * 2);
        foo += 'aborting\n';
        controller.abort();
        foo += 'aborted\n';
      }
      if (iterationNumber > totalIterations) {
        totalIterations = iterationNumber;
      }
    }, delay, signal);

    timeoutLoop.catch(common.mustCall(() => {
      assert.ok(totalIterations >= 3, `iterations was ${totalIterations} < 3, ${foo}`);
      console.log(foo);
    }));
  }
}

Most of the time, the test succeeds and the output looks like this:

$ ./node --no-warnings --expose-internals test/parallel/test-timers-promisified.js
start 1
start 2
awaiting
aborting
aborted
start 3

$ 

But fairly often, the test fails and the output instead looks like this:

$ ./node --no-warnings --expose-internals test/parallel/test-timers-promisified.js
node:internal/process/promises:227
          triggerUncaughtException(err, true /* fromPromise */);
          ^

AssertionError [ERR_ASSERTION]: iterations was 2 < 3, start 1
start 2
awaiting
aborting
aborted

    at /Users/trott/io.js/test/parallel/test-timers-promisified.js:375:14
    at /Users/trott/io.js/test/common/index.js:376:15
    at runNextTicks (node:internal/process/task_queues:59:5)
    at processTimers (node:internal/timers:496:9) {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}
$ 

Note there is no start 1!

So, if that should be impossible, if the functions should be executing in order, then that's a legit bug to fix. On the other hand, if that should be possible, then the test needs to be changed to accommodate it.

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

Maybe the ordering problem is caused by this bit in the test?

    for await (const value of interval) {
      assert.strictEqual(value, input);
      iteration++;
      await fn(iteration);
    }

The first await "pauses" execution for iteration 1 and then iteration 2 comes along and manages to get to the second await? Something like that?

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

@Trott There is another test that is also dependant on such ordering (the last one in the test-timers-promisifed.js) is it also introducing flakiness? The one on lines 375-393 (I couldn't make it fail using tools/test.py -j96 --repeat=192 test/parallel/test-timers-promisified.js).

@Linkgoron Yeah, I can't make the other one fail either. If it is also not 100% reliable, it is much closer to 100% than the one we're talking about.

@benjamingr
Copy link
Member

Hmm, the intervals are firing out of order, and I do think that's a bug?

Yes, I tend to agree this is a bug in our timers.

I don't actually think the queues are different (I checked the code and that's my understanding). @Linkgoron also mentioned the test fails even if you use setInterval in all cases (without mixing setInterval/setTimeout)

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

Note there is no start 1!

Whoops, there is a start 1 and I just missed it because of line-wrapping. 🤦‍♂️

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

Yes, I tend to agree this is a bug in our timers.

Yeah, it's either a bug in our timer ordering or else a bug in the test runInterval() wrapper logic.

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

OK, so sorry for all the bad information. Intervals are not firing out of order as far as I can tell. It does appear that the issue is simply that await setTimeout() doesn't cause a long enough pause on a slow machine for the queue to catch up. Back to the original diagnosis.....

@Linkgoron
Copy link
Member

Linkgoron commented Feb 4, 2021

Yes, I tend to agree this is a bug in our timers.

Yeah, it's either a bug in our timer ordering or else a bug in the test runInterval() wrapper logic.

I don't think it's the runInterval logic. At least, the following "rewrite" is also flaky (works when run with the regular command, but crashes with your command):

async function testRun() {
      const controller = new AbortController();
      const { signal } = controller;
      const interval = setInterval(10, undefined, { signal });
      const iterator = interval[Symbol.asyncIterator]();
      return iterator.next()
        .then(() => setTimeout(20))
        .then(() => controller.abort())
        .then(() => iterator.next())
        .catch(common.mustNotCall())
        .then(() => iterator.return());
    }
    testRun().then(common.mustCall());

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

OK, so sorry for all the bad information. Intervals are not firing out of order as far as I can tell. It does appear that the issue is simply that await setTimeout() doesn't cause a long enough pause on a slow machine for the queue to catch up. Back to the original diagnosis.....

Forgot the details....

Code:

{
  let foo = '\n';

  async function runInterval(fn, intervalTime, signal) {
    const input = 'foobar';
    const interval = setInterval(intervalTime, input, { signal });
    let iteration = 0;
    for await (const value of interval) {
      foo += `got ${value} from for await\n`;
      assert.strictEqual(value, input);
      iteration++;
      await fn(iteration);
      foo += `ran function for ${iteration}\n`;
    }
  }

  {
    // Check that if we abort when we have some callbacks left,
    // we actually call them.
    const controller = new AbortController();
    const { signal } = controller;
    const delay = 1;
    let totalIterations = 0;
    const timeoutLoop = runInterval(async (iterationNumber) => {
      foo += `start ${iterationNumber}\n`;
      if (iterationNumber === 2) {
        foo += 'awaiting\n';
        await setTimeout(delay * 2);
        foo += 'aborting\n';
        controller.abort();
        foo += 'aborted\n';
      }
      if (iterationNumber > totalIterations) {
        totalIterations = iterationNumber;
      }
    }, delay, signal);

    timeoutLoop.catch(common.mustCall(() => {
      assert.ok(totalIterations >= 3, `iterations was ${totalIterations} < 3, ${foo}`);
      console.log(foo);
    }));
  }
}

Output on success:

got foobar from for await
start 1
ran function for 1
got foobar from for await
start 2
awaiting
aborting
aborted
ran function for 2
got foobar from for await
start 3
ran function for 3

Output on failure:

AssertionError [ERR_ASSERTION]: iterations was 2 < 3, 
got foobar from for await
start 1
ran function for 1
got foobar from for await
start 2
awaiting
aborting
aborted
ran function for 2

    at /Users/trott/io.js/test/parallel/test-timers-promisified.js:378:14
    at /Users/trott/io.js/test/common/index.js:376:15
    at runNextTicks (node:internal/process/task_queues:59:5)
    at processTimers (node:internal/timers:496:9) {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

The conclusion I draw from that is that the await setTimeout() isn't of sufficient duration to guarantee another interval gets to run. It's tempting to simply increase the millisecond delay for setTimeout() but again, I think that probably masks the issue. But I'm not sure what a better solution is.

@Linkgoron
Copy link
Member

Linkgoron commented Feb 4, 2021

@Trott
BTW, instead of doing setTimeout(delay*2), doing setTimeout twice with delay works without issues for me:

{
    // Check that if we abort when we have some callbacks left,
    // we actually call them.
    const controller = new AbortController();
    const { signal } = controller;
    const delay = 10;
    let totalIterations = 0;
    const timeoutLoop = runInterval(async (iterationNumber) => {
      if (iterationNumber === 2) {
        await setTimeout(delay);
        await setTimeout(delay);
        controller.abort();
      }
      if (iterationNumber > totalIterations) {
        totalIterations = iterationNumber;
      }
    }, delay, signal);

    timeoutLoop.catch(common.mustCall(() => {
      assert.ok(totalIterations >= 3, `iterations was ${totalIterations} < 3`);
    }));
  }
}

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

BTW, instead of doing setTimeout(delay*2), doing setTimeout twice with delay works without issues for me:

This is interesting. I'm finding that if I use just one setTimeout(delay), then it's reliable. But the idea that setTimeout(delay) is reliable when setTimeout(delay * 2) is not is...puzzling.

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

This is interesting. I'm finding that if I use just one setTimeout(delay), then it's reliable. But the idea that setTimeout(delay) is reliable when setTimeout(delay * 2) is not is...puzzling.

Which puts us back into "bug in timers" territory rather than "bug in test"? I guess? I'm a mess. This is what debugging does to me.

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

Yeah, this is really strange. Changing delay * 2 to delay (and changing nothing else in the test--just that one line) makes it reliable for me.

For that matter, changing it to delay * 2 - 1 makes it reliable as well.

Is that what you're seeing too? Is this some heuristic in the timers that groups things together based on...ratios or something? That would seem to be the obvious explanation for this:

9 = ERROR
10 through 19 = OK
20 = ERROR

Other ideas?

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

Yeah, this is really strange. Changing delay * 2 to delay (and changing nothing else in the test--just that one line) makes it reliable for me.

For that matter, changing it to delay * 2 - 1 makes it reliable as well.

Forgot to also mention that delay - 1 breaks the test again. Which is a big part of what makes it strange.

delay - 1 is broken, but delay is reliable.
delay * 2 is broken, but delay * 2 - 1 is reliable.

@Linkgoron
Copy link
Member

Linkgoron commented Feb 4, 2021

According to internal/timers.js timers with the same duration are kept in the same linked list where executing the different lists is based on priority queues. Obviously this is a long-shot based on what I've barely skimmed, but maybe if they're in the same list everything is OK (i.e. have the same delay), as the two timers are really dependent and this would probably always work correctly.

Regarding using the different values, I assume that there's some issue with the priority queue where different values create different heaps (the priority-queue is built using a heap) and somehow they're prioritised differently.

@Fishrock123
Copy link
Contributor

Fishrock123 commented Feb 4, 2021

If timers are no longer ordered that means someone really messed up merging something in the last year as several tests would be rendered unstable and likely difficult to fix, and it would probably also mess with user code too. We probably should have just switched to making the ordering guarantee officially

I suggest someone look into that.

Ideally my write up should still be true:

// HOW and WHY the timers implementation works the way it does.
//
// Timers are crucial to Node.js. Internally, any TCP I/O connection creates a
// timer so that we can time out of connections. Additionally, many user
// libraries and applications also use timers. As such there may be a
// significantly large amount of timeouts scheduled at any given time.
// Therefore, it is very important that the timers implementation is performant
// and efficient.
//
// Note: It is suggested you first read through the lib/internal/linkedlist.js
// linked list implementation, since timers depend on it extensively. It can be
// somewhat counter-intuitive at first, as it is not actually a class. Instead,
// it is a set of helpers that operate on an existing object.
//
// In order to be as performant as possible, the architecture and data
// structures are designed so that they are optimized to handle the following
// use cases as efficiently as possible:
// - Adding a new timer. (insert)
// - Removing an existing timer. (remove)
// - Handling a timer timing out. (timeout)
//
// Whenever possible, the implementation tries to make the complexity of these
// operations as close to constant-time as possible.
// (So that performance is not impacted by the number of scheduled timers.)
//
// Object maps are kept which contain linked lists keyed by their duration in
// milliseconds.
//
/* eslint-disable node-core/non-ascii-character */
//
// ╔════ > Object Map
// ║
// ╠══
// ║ lists: { '40': { }, '320': { etc } } (keys of millisecond duration)
// ╚══ ┌────┘
// │
// ╔══ │
// ║ TimersList { _idleNext: { }, _idlePrev: (self) }
// ║ ┌────────────────┘
// ║ ╔══ │ ^
// ║ ║ { _idleNext: { }, _idlePrev: { }, _onTimeout: (callback) }
// ║ ║ ┌───────────┘
// ║ ║ │ ^
// ║ ║ { _idleNext: { etc }, _idlePrev: { }, _onTimeout: (callback) }
// ╠══ ╠══
// ║ ║
// ║ ╚════ > Actual JavaScript timeouts
// ║
// ╚════ > Linked List
//
/* eslint-enable node-core/non-ascii-character */
//
// With this, virtually constant-time insertion (append), removal, and timeout
// is possible in the JavaScript layer. Any one list of timers is able to be
// sorted by just appending to it because all timers within share the same
// duration. Therefore, any timer added later will always have been scheduled to
// timeout later, thus only needing to be appended.
// Removal from an object-property linked list is also virtually constant-time
// as can be seen in the lib/internal/linkedlist.js implementation.
// Timeouts only need to process any timers currently due to expire, which will
// always be at the beginning of the list for reasons stated above. Any timers
// after the first one encountered that does not yet need to timeout will also
// always be due to timeout at a later time.
//
// Less-than constant time operations are thus contained in two places:
// The PriorityQueue — an efficient binary heap implementation that does all
// operations in worst-case O(log n) time — which manages the order of expiring
// Timeout lists and the object map lookup of a specific list by the duration of
// timers within (or creation of a new list). However, these operations combined
// have shown to be trivial in comparison to other timers architectures.

If it doesn't operate that way someone needs to either change it back or re-document it.

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

@Fishrock123 I'm pretty sure the issue (if it is around timing) involves only the promises.* APIs (and quite possibly only promises.setInterval(). So that may explain why it hasn't been setting off alarm bells.

That said, I think the issue may be in the test after all. Next comment coming up....

@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

@Linkgoron @benjamingr Does this correctly describes what the test is doing/what the expectations are?

If controller.abort() is called while an instance of the interval is unresolved, then that unresolved promise should still be permitted to finish (resolve or reject). It should not be terminated while awaiting a result. It should not be auto-rejected.

If so, I think the fix is to get rid of setTimeout() altogether. I would expect it to be unreliable (although I admit that I am still curious about the whole "reliable within a certain hard-to-explain range" quality, if anyone is able to replicate what I'm seeing locally).

@Linkgoron
Copy link
Member

Linkgoron commented Feb 4, 2021

@Trott
I think I've managed to reproduce this with the regular settimeout/setinterval.

The issue appears to be that there's an interval in a previous test that has the same timeout as the setTimeout (20). That's why it appears as though 19 etc. work (2*delay-1), it's just that there's another timer with the same value.

The following failed for me (once) when I executed it with high parallel:

setInterval(() => {
}, 20);

let i = 0;
setInterval(() => {
  if (++i === 2) {
    setTimeout(common.mustCall(function wat() {
      if (i === 2) {
        throw new Error('asdasd');
      }
    }), 20);
  }
  if (i === 10) {
    process.exit();
  }
}, 10);

I found this out when I accidentally changed line 339 in test-timers-promisified to delay = 30, suddenly the test stopped crashing.
I put the above in a file called test-timers-ordering-2.js (the setTimeout and setInterval are the callback versions)
image

@Linkgoron
Copy link
Member

@Trott @benjamingr

Yep, this test fails for me consistently when using the j96 --repeat=192 command.

const common = require('../common');
setInterval(() => {}, 20);

let i = 0;
setInterval(() => {
  if (++i === 2) {
    setTimeout(common.mustCall(function wat() {
      if (i === 2) {
        throw new Error('asdasd');
      }
    }), 20);
  }
  if (i === 10) {
    process.exit();
  }
}, 10);

Removing the first (empty) setInterval makes the test pass consistently with that command.

Trott added a commit to Trott/io.js that referenced this issue Feb 4, 2021
By using events, we can make sure the call to `controller.abort()`
doesn't happen before the next iteration starts.

Fixes: nodejs#37226
Trott added a commit to Trott/io.js that referenced this issue Feb 4, 2021
By using events, we can make sure the call to `controller.abort()`
doesn't happen before the next iteration starts.

Fixes: nodejs#37226
@Trott
Copy link
Member Author

Trott commented Feb 4, 2021

@Linkgoron Nice find. I'm glad there's a rational explanation for why 20 was exhibiting special behavior.

I think #37230 fixes this issue by using events to make sure the next interval has started before running controller.abort(). I am unable to make it fail and doing the string-based debugging shows stuff happening in the order I expect. (Third interval starts, then awaits, second interval runs controller.abort(), third interval finishes.)

@Linkgoron
Copy link
Member

Linkgoron commented Feb 6, 2021

@Trott OK, I'm not sure if I was the only one who didn't get it, but after thinking about this for a while, I think I know what the issue is.

Suppose that this is how the world "looks" like:
Timer number 1 has a timeout of 20 and needs to run at time 25
Timer number 2 has a timeout of 10 and needs to run at time 30
Timer number 3 has a timeout of 20 and needs to run at time 35

The correct order should be 1->2->3. However, my PC is so slow that it executes on time 40. The 20 list is the first in the priority queue (timerListQueue.peek()), because it has a lower expiry. We take it and run all of the timers in the list (listOnTimeout method), which are timers 1 and 3, and it only then executes the 10 timers list, i.e. timer 2, resulting in a 1->3->2 execution.

Trott added a commit to Trott/io.js that referenced this issue Feb 6, 2021
By using events, we can make sure the call to `controller.abort()`
doesn't happen before the next iteration starts.

Fixes: nodejs#37226
@benjamingr
Copy link
Member

I think that's a (pretty severe TBH) real bug in timers and we should fix the underlying issue.

@Fishrock123 do you remember why the timer lists are batched by the timeout (the parameter passed) and not the deadline (when they should execute)?

Trott added a commit to Trott/io.js that referenced this issue Feb 7, 2021
@Trott
Copy link
Member Author

Trott commented Feb 7, 2021

I think that's a (pretty severe TBH) real bug in timers

@benjamingr I think it's a known limitation. I imagine it was done this way for efficiency. As the docs say (emphasis added):

Node.js makes no guarantees about the exact timing of when callbacks will fire, nor of their ordering.

@Fishrock123
Copy link
Contributor

Fishrock123 commented Feb 7, 2021

@Fishrock123 do you remember why the timer lists are batched by the timeout (the parameter passed) and not the deadline (when they should execute)?

Please re-read the comment.

As the docs say (emphasis added):

Node.js makes no guarantees about the exact timing of when callbacks will fire, nor of their ordering.

This is untrue. There are guarentees about their ordering, as I stated, if the ordering changes from how it was several test cases will fail. That also doesn't mean timers are ordered in the way one might expect, though. I don't know why people keep using that comment as the source of truth when it is anything but that. It is simply a warning to unsuspecting users. We are not really those unsuspecting users.


Suppose that this is how the world "looks" like:

Timer number 1 has a timeout of 20 and needs to run at time 25
Timer number 2 has a timeout of 10 and needs to run at time 30
Timer number 3 has a timeout of 20 and needs to run at time 35

The correct order should be 1->2->3. However, my PC is so slow that it executes on time 40. The 20 list is the first in the priority queue (timerListQueue.peek()), because it has a lower expiry. We take it and run all of the timers in the list (listOnTimeout method), which are timers 1 and 3, and it only then executes the 10 timers list, i.e. timer 2, resulting in a 1->3->2 execution.

This sounds familiar, I wonder if there was code at some point before the binary heap was moved into JS (when each list was backed by a separate uv handle) that dealt with this.

I suggest searching through the old timers issues and PRs and seeing if anything similar exists.

@apapirovski
Copy link
Member

apapirovski commented Feb 15, 2023

This sounds familiar, I wonder if there was code at some point before the binary heap was moved into JS (when each list was backed by a separate uv handle) that dealt with this.

The issue has existed since the ring was first conceived of. There's no perfect solution here because we either need to sort all timers to get this level of accuracy, which is expensive, or we sort on some granularity level, which introduces some amount of supposedly acceptable accuracy-loss. The example given is very simplified but the reality is any n-th duration could have the actual next expiring timer.

Binary heap here just replaced there being a handle per timers list and requiring constant hand-off back to libuv.

XadillaX added a commit to XadillaX/node that referenced this issue Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout.
Projects
None yet
5 participants