Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setTimeout callback executed before delay elapses #10154

Closed
marzelin opened this issue Dec 6, 2016 · 34 comments
Closed

setTimeout callback executed before delay elapses #10154

marzelin opened this issue Dec 6, 2016 · 34 comments
Assignees
Labels
blocked PRs that are blocked by other issues or PRs. libuv Issues and PRs related to the libuv dependency or the uv binding. question Issues that look for answers. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout.

Comments

@marzelin
Copy link
Contributor

marzelin commented Dec 6, 2016

  • Version: 7.2.0
  • Platform: Ubuntu 16.10 64-bit

When the delay is set to 1, for about 5% of cases the logged time between executing setTimeout() and the callback execution is 0.

// main.js
const timeJustBeforeCallingSetTimeout = Date.now();
setTimeout(() => {
  const callbackExecutionTime = Date.now();
  console.log(callbackExecutionTime - timeJustBeforeCallingSetTimeout);
}, 1);
# bash script that runs above code 1000 times and counts cases when 0 is logged
rm -f results.txt
for ((n=0; n<1000; n++))
do
  node main.js >> results.txt
done
cat results.txt | grep 0 | wc -l

When the delay value is set to 2, there are no cases where the logged time is less than 2, which is an expected behavior.

@addaleax addaleax added libuv Issues and PRs related to the libuv dependency or the uv binding. question Issues that look for answers. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. labels Dec 6, 2016
@bnoordhuis
Copy link
Member

Cannot reproduce locally on a FC25 system.

A possible explanation is that Date.now() is backed by a CLOCK_REALTIME time source (gettimeofday) whereas node/libuv uses a CLOCK_MONOTONIC clock. If ntpdate is adjusting the system clock, you might see <= 0 intervals between two Date.now() calls.

What happens when you use process.hrtime() instead?

@marzelin
Copy link
Contributor Author

marzelin commented Dec 6, 2016

For code with process.hrtime() used instead of Date.now(), the percentage of cases with measured delay that is less than 1ms grows to 30%. The lowest time observed is ~930000ns.

// main.js
const timeJustBeforeCallingSetTimeout = process.hrtime();
setTimeout(() => {
  const delayBetweenMainCodeAndCallbackExecutionInNanoSec =
    process.hrtime(timeJustBeforeCallingSetTimeout)[1];
  console.log(delayBetweenMainCodeAndCallbackExecutionInNanoSec);
}, 1);
# bash script that runs js code 1000 times
rm -f results.txt
for ((n=0;n<1000;n++))
do
  node main.js >> results.txt
done
cat results.txt | grep '^.\{6\}$' | wc -l

@Fishrock123
Copy link
Contributor

@marzelin Could you try process.binding('timer_wrap').Timer.now() and see if it makes a difference? That is what timers use internally. (I think process.hrtime also uses that but not 100% sure.)

@marzelin
Copy link
Contributor Author

marzelin commented Dec 7, 2016

When time difference is measured with process.binding('timer_wrap').Timer.now() there are no cases where the difference is less than 1, which is the expected behavior.

// main.js
const timeJustBeforeCallingSetTimeout = process.binding('timer_wrap').Timer.now();
setTimeout(() => {
  const callbackExecutionTime = process.binding('timer_wrap').Timer.now();
  console.log(callbackExecutionTime - timeJustBeforeCallingSetTimeout);
}, 1);
# bash script that runs js code 1000 times and counts cases where time difference is 0
rm -f results.txt
for ((n=0;n<1000;n++))
do
  node main.js >> results.txt
done
cat results.txt | grep 0 | wc -l

@Fishrock123
Copy link
Contributor

@bnoordhuis any idea?

@bnoordhuis
Copy link
Member

@marzelin Are you running Ubuntu inside a VM or a container? I know VirtualBox has (had?) clock skew bugs in the past, perhaps other products do too.

@marzelin
Copy link
Contributor Author

marzelin commented Dec 8, 2016

@bnoordhuis Ubuntu is loaded directly by UEFI, no VMs are involved. On Windows 10 (same PC and node version) results are similar.

@bnoordhuis
Copy link
Member

I have been able to reproduce this a few times now (over 100,000s of runs) but so far I've only been able to establish that the kernel seems to return early from the epoll_wait() system call for no discernible reason whatsoever. This is with a stock 4.8.8-300.fc25.x86_64 kernel.

@marzelin Re: Windows: did you test with Date.now(), Timer.now() or process.hrtime()?

@marzelin
Copy link
Contributor Author

marzelin commented Dec 8, 2016

@bnoordhuis All of them, but I did only a few tests there. setTimeout() seems to behave in the same way as on Ubuntu.

@bnoordhuis
Copy link
Member

bnoordhuis commented Dec 9, 2016

I'm coming around to the idea that this might be a hardware issue. I've so far only been able to reproduce on one system and I noticed that this system's hardware clock drifts by one or two seconds every few hours.

@marzelin Can you post the output of:

$ cat /sys/devices/system/clocksource/clocksource*/available_clocksource
# And
$ cat /sys/devices/system/clocksource/clocksource*/current_clocksource

Your system probably only has one hardware clock. Try switching to different sources (e.g. from tsc to hpet) and see if the problem goes away.

EDIT: Oh, and please disable anything that syncs the clock; ntpd, ntpdate cron jobs, etc.

@marzelin
Copy link
Contributor Author

marzelin commented Dec 9, 2016

@bnoordhuis

$ cat /sys/devices/system/clocksource/clocksource*/available_clocksource
tsc hpet acpi_pm

$ cat /sys/devices/system/clocksource/clocksource*/current_clocksource
tsc

I've got the same results when clocksource is set to hpet or acpi_pm.

@marzelin
Copy link
Contributor Author

marzelin commented Dec 9, 2016

I did some other tests and it seems like process.binding('timer_wrap').Timer.now() takes some arbitrary point of time and looks only at number of milliseconds, omitting the fractional part.

So for example if we have point in time:
123 ms and 990,000 ns
after 10,000 ns Timer.now() will tell 1ms has passed, but in reality only 0,01 ms has passed as process.hrtime will show.

@xk
Copy link
Contributor

xk commented Dec 10, 2016

Node's timers have always been borken, check it out:
https://gist.github.com/xk/eb2e75c11ef0435a43438a373f8c0113

Edit:
Ooops, I was testing this with an old node v0.10.26 which errs all the time but newer nodes do not, at least since v0.12.0, or so it seems... Great! Thanks! :-)

@Fishrock123
Copy link
Contributor

@bnoordhuis any further thoughts?

@bnoordhuis
Copy link
Member

I'm still investigating this although I probably won't revisit it until January. You can assign it to me if you want.

@jBarz
Copy link
Contributor

jBarz commented Jan 13, 2017

I suspect that this is because we track loop->time in libuv in milliseconds.
loop->time = uv__hrtime(UV_CLOCK_FAST) / 1000000
So the difference between 2 different loop->time values could potentially show a 1 millisecond difference even when 1 nanosecond has actually elapsed and thus trigger timers too early.
This could be fixed by tracking loop->time in nanoseconds instead of milliseconds.
I tested this change and it seems to fix the issue (no more 0 second delays).
Let me know if this solution makes sense.
jBarz@84fbba5

@saghul
Copy link
Member

saghul commented Jan 13, 2017 via email

@jBarz
Copy link
Contributor

jBarz commented Jan 14, 2017

I have submitted a PR on the libuv repo
libuv/libuv#1191

@Trott
Copy link
Member

Trott commented Jul 15, 2017

Should this be closed (or at least labeled blocked) unless and until the libuv issue gets resolved?

@Trott Trott added the blocked PRs that are blocked by other issues or PRs. label Aug 10, 2017
@Trott
Copy link
Member

Trott commented Aug 10, 2017

Labeled blocked as (IIUC) nothing can happen until libuv/libuv#1191 lands first.

@ikokostya
Copy link

ikokostya commented Aug 11, 2017

Looks like I have the same issue which stability reproduced on the following test:

'use strict';

const assert = require('assert');

const DELAY = 10;

function call() {
    const start = process.hrtime();
    setTimeout(() => {
        const diff = process.hrtime(start);
        const [seconds, nanoseconds] = diff;
        const ms = seconds * 1e3 + nanoseconds * 1e-6;
        console.log(ms);
        assert(ms >= DELAY);
        call();
    }, DELAY);
}

call();
[kostya@arch ~]$ node -v
v8.1.3
[kostya@arch ~]$ node t
11.467041
9.704634
assert.js:60
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/t.js:14:9)
    at ontimeout (timers.js:488:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:283:5)
Next 3 consecutive runs:
[kostya@arch ~]$ node t
12.213059
9.567727999999999
assert.js:60
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/t.js:14:9)
    at ontimeout (timers.js:488:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:283:5)
[kostya@arch ~]$ node t
11.104375
9.571886
assert.js:60
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/t.js:14:9)
    at ontimeout (timers.js:488:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:283:5)
[kostya@arch ~]$ node t
10.487898999999999
10.68757
10.291964
10.234295999999999
10.246357999999999
10.16991
10.187873
10.203797999999999
9.151824
assert.js:60
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/t.js:14:9)
    at ontimeout (timers.js:488:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:283:5)

@ikokostya
Copy link

ikokostya commented Sep 4, 2017

@Trott Could you revisit this issue? Looks like this issue can be solved on nodejs side libuv/libuv#1191 (comment).

@Trott
Copy link
Member

Trott commented Sep 5, 2017

@ikokostya Your repro triggers the assertion for me on Node.js 8.4.0 but not on 9.0.0-pre (current master branch). Any chance you can compile from the master branch or grab the most recent nightly and see if it's fixed for you there?

@ikokostya
Copy link

The same error:

[kostya@arch node-v9.0.0-nightly20170904b24e269a48-linux-x86]$ ./bin/node -v
v9.0.0-nightly20170904b24e269a48
[kostya@arch node-v9.0.0-nightly20170904b24e269a48-linux-x86]$ ./bin/node t
10.539683
10.501904
10.166082
10.186332
10.193605
10.178288
10.197446
10.175659
10.184133
10.203318
10.162325
10.159507
10.157172
9.153143
assert.js:41
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: false == true
    at Timeout.setTimeout [as _onTimeout] (/home/kostya/Downloads/node-v9.0.0-nightly20170904b24e269a48-linux-x86/t.js:14:9)
    at ontimeout (timers.js:471:11)
    at tryOnTimeout (timers.js:305:5)
    at Timer.listOnTimeout (timers.js:265:5)

@Trott
Copy link
Member

Trott commented Sep 6, 2017

Interesting that I can't replicate on macOS.

@Fishrock123 @misterdjules Any thoughts on the fix for #10154 (comment) suggested by @jBarz at libuv/libuv#1191 (comment)?

@ikokostya
Copy link

Any news? I still get error for nodejs 9.0.0.

@apapirovski
Copy link
Member

@bnoordhuis Can anything be done here?

@bnoordhuis
Copy link
Member

@jBarz Per libuv/libuv#1191 (comment), do you want to open a pull request here?

@apapirovski
Copy link
Member

@bnoordhuis @jBarz I'm not sure the proposed fix there solves the issue. We already have something very similar (setting timeRemaining = 1) which works fine for most cases but it only runs when diff < msecs which in the reproduction above from @ikokostya is not actually the case.

@jBarz
Copy link
Contributor

jBarz commented Apr 11, 2018

I've tried the fix myself and it seems to resolve the issue.

@apapirovski Are you able to recreate the error using @ikokostya test case with the proposed fix?

You're right, the case is not diff < msecs
However, the case fails because (TimerWrap.now() - timer._idleStart) has at most a 1 millisecond rounding error because they both capture time by dividing the high resolution timer by 1000000.
The fix is simply adding 1ms to timeRemaining for all cases.
I can open a PR if this sounds ok.

@apapirovski
Copy link
Member

@jBarz yeah, I see what you mean now. Not a huge fan of adding 1ms to all timers (the magic number is a bit of a code smell) that run through that block but I'm not really seeing another solution. I suppose it depends on whether we consider this important to fix or if infrequent 1ms inaccuracy is acceptable. FWIW it should happen a lot less now than it used to when this issue was made.

@jBarz
Copy link
Contributor

jBarz commented Apr 12, 2018

Yea, I agree that adding 1ms to all timers is not ideal.
I guess we could consider that test case as invalid because it is using the high resolution process.hrtime to measure accuracy of the low res uv_now.
So nothing to fix here.

@bnoordhuis
Copy link
Member

Okay, I'll close this out then.

@apapirovski
Copy link
Member

apapirovski commented May 12, 2018

This can be fixed by: #20555

(I've had the test case running for an hour with no failures. Pretty sure it resolves the issue.)

Edit: I've taken out that bit because it made that PR too big but it'll be fixed in a follow-up PR. Either way, I'm keeping this open since it can be solved by Node.js.

apapirovski added a commit that referenced this issue Jun 25, 2018
Refactor Timers to behave more similarly to Immediates by having
a single uv_timer_t handle which is stored on the Environment.

No longer expose timers in a public binding and instead make
it part of the internalBinding.

PR-URL: #20894
Fixes: #10154
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
apapirovski added a commit that referenced this issue Jun 25, 2018
This is no longer necessary in the only place it was used (timers).

PR-URL: #20894
Fixes: #10154
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
targos pushed a commit to targos/node that referenced this issue Jul 31, 2018
The timers directory test, utilizing FakeTime, has not worked in
quite a while and is not truly testing Node.js behaviour. If a
similar test is necessary it would be better suited to libuv
on which Node.js relies for timers functionality.

PR-URL: nodejs#20894
Fixes: nodejs#10154
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
targos pushed a commit that referenced this issue Aug 2, 2018
The timers directory test, utilizing FakeTime, has not worked in
quite a while and is not truly testing Node.js behaviour. If a
similar test is necessary it would be better suited to libuv
on which Node.js relies for timers functionality.

PR-URL: #20894
Fixes: #10154
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>

Backport-PR-URL: #22039
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Minwoo Jung <[email protected]>
michael-c-snyk added a commit to snyk/cli that referenced this issue Nov 19, 2020
In the modified test file, sleep() is backed by setTimeout(), and some Node runtimes might execute setTimeout callback before the specified delay, meaning that tmi.getValue() will occasionally return `9`. This produces a flaky test. An extra ms leeway in the assertion works round the issue. nodejs/node#10154

Test name updated to reflect that the timer's accuracy does not have single-digit precision
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked PRs that are blocked by other issues or PRs. libuv Issues and PRs related to the libuv dependency or the uv binding. question Issues that look for answers. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants