-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent failure in ThreadSafe function #906
Comments
Also seeing occasional failure like this one: https://ci.nodejs.org/job/node-test-node-addon-api-new/nodes=ubuntu1604-64/3564/console Running test 'threadsafe_function/threadsafe_function'
Running test 'threadsafe_function/threadsafe_function_ctx'
Running test 'threadsafe_function/threadsafe_function_existing_tsfn'
Running test 'threadsafe_function/threadsafe_function_ptr'
Running test 'threadsafe_function/threadsafe_function_sum'
Running test 'threadsafe_function/threadsafe_function_unref'
Running test 'typed_threadsafe_function/typed_threadsafe_function'
Running test 'typed_threadsafe_function/typed_threadsafe_function_ctx'
Running test 'typed_threadsafe_function/typed_threadsafe_function_existing_tsfn'
Running test 'typed_threadsafe_function/typed_threadsafe_function_ptr'
Running test 'typed_threadsafe_function/typed_threadsafe_function_sum'
Running test 'typed_threadsafe_function/typed_threadsafe_function_unref'
Running test 'typedarray-bigint'
Running test 'typedarray'
Running test 'version_management'
All tests passed!
Tests aborted with SIGSEGV
npm ERR! code 1
npm ERR! path /home/iojs/build/workspace/node-test-node-addon-api-new/nodes/ubuntu1604-64/node-addon-api
npm ERR! command failed
npm ERR! command sh -c node test
npm ERR! A complete log of this run can be found in:
npm ERR! /home/iojs/build/workspace/node-test-node-addon-api-new/nodes/ubuntu1604-64/npm-cache/_logs/2021-02-06T09_34_12_368Z-debug.log
Build step 'Conditional steps (multiple)' marked build as failure
Sending e-mails to: [email protected] [email protected] [email protected]
Collecting metadata...
Metadata collection done.
Notifying upstream projects of job completion
Finished: FAILURE |
@mhdawson this looks like it's against Node.js nightly. Might it be a regression in core? |
The stack doesn't seem to contain any of our frames – unless I'm glossing over some. |
I built against nodejs/node@1c6484b and got
|
Hmmm ... I can't reproduce that crash. |
Trying to reproduce inside an Ubuntu 16.04 container. |
The crash on shutdown with master ow seems to recreate consistently:
@gabrielschulhof I'm wondering if that might be related to the change from nodejs/node#37616. Any chance you still have the environment you had for that PR to compare before/after the change? |
Cutting down the test case this is the one that is failing: Starting program: /home/midawson/node/node-v16.0.0-nightly202103215318e53fd8-linux-x64/bin/node --expose-gc --no-concurrent-array-buffer-sweeping /home/midawson/newpull/land/node-addon-api/test/objectreference.js
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-127.el8.x86_64
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6cbd700 (LWP 334846)]
[New Thread 0x7ffff64bc700 (LWP 334847)]
[New Thread 0x7ffff5cbb700 (LWP 334848)]
[New Thread 0x7ffff54ba700 (LWP 334849)]
[New Thread 0x7ffff4cb9700 (LWP 334850)]
[New Thread 0x7ffff7ff6700 (LWP 334851)]
Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x0000000000e7acb0 in v8::internal::GlobalHandles::ClearWeakness(unsigned long*) ()
Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64
(gdb) |
Cut down test case to these 2 and it still crashes: () => {
binding.objectreference.setObjects("hello", "world", "javascript");
const test = binding.objectreference.getFromValue();
const test2 = binding.objectreference.getFromValue("hello");
assert.deepEqual({ hello: "world" }, test);
assert.deepEqual({ hello: "world" }, test2);
assert.deepEqual(test, test2);
},
() => {
binding.objectreference.setObjects(1, "hello world");
const test = binding.objectreference.getFromValue();
const test2 = binding.objectreference.getFromGetter(1);
assert.deepEqual({ 1: "hello world"}, test);
assert.equal("hello world", test2);
assert.equal(test[1], test2);
}, |
But not with just one of the 2 |
Just these 2 also seems to recreate () => {
binding.objectreference.setObjects(1, "hello world");
const test = binding.objectreference.getFromValue();
const test2 = binding.objectreference.getFromGetter(1);
assert.deepEqual({ 1: "hello world"}, test);
assert.equal("hello world", test2);
assert.equal(test[1], test2);
},
() => {
binding.objectreference.setObjects(0, "hello");
binding.objectreference.setObjects(1, "world");
const test = binding.objectreference.getFromValue();
const test2 = binding.objectreference.getFromGetter(0);
const test3 = binding.objectreference.getFromGetter(1);
assert.deepEqual({ 1: "world"}, test);
assert.equal(undefined, test2);
assert.equal("world", test3);
},
|
May require 2 tests so that a gc is forced after one of the tests complete ? |
Cutting down the tests to just: // info[0] is the key, which can be either a string or a number.
// info[1] is the value.
// info[2] is a flag that differentiates whether the key is a
// C string or a JavaScript string.
void SetObjects(const CallbackInfo& info) {
Env env = info.Env();
HandleScope scope(env);
weak = Weak(Object::New(env));
weak.SuppressDestruct();
/* persistent = Persistent(Object::New(env));
persistent.SuppressDestruct();
reference = Reference<Object>::New(Object::New(env), 2);
reference.SuppressDestruct();
*/
if (info[0].IsString()) {
if (info[2].As<String>() == String::New(env, "javascript")) {
// weak.Set(info[0].As<String>(), info[1]);
// persistent.Set(info[0].As<String>(), info[1]);
// reference.Set(info[0].As<String>(), info[1]);
} else {
weak.Set(info[0].As<String>().Utf8Value(), info[1]);
persistent.Set(info[0].As<String>().Utf8Value(), info[1]);
reference.Set(info[0].As<String>().Utf8Value(), info[1]);
}
} else if (info[0].IsNumber()) {
weak.Set(info[0].As<Number>(), info[1]);
persistent.Set(info[0].As<Number>(), info[1]);
reference.Set(info[0].As<Number>(), info[1]);
}
} () => {
binding.objectreference.setObjects("hello", "world", "javascript");
// const test = binding.objectreference.getFromValue();
// const test2 = binding.objectreference.getFromValue("hello");
// assert.deepEqual({ hello: "world" }, test);
// assert.deepEqual({ hello: "world" }, test2);
// assert.deepEqual(test, test2);
},
() => {
// binding.objectreference.setObjects(1, "hello world");
// const test = binding.objectreference.getFromValue();
// const test2 = binding.objectreference.getFromGetter(1);
// assert.deepEqual({ 1: "hello world"}, test);
// assert.equal("hello world", test2);
// assert.equal(test[1], test2);
}, still recreates So it seems like just creating a Weak causes it? The strange thing is that I would have thought the C node-api tests would have caught an issue like that. |
I've got to think its related to nodejs/node#37616, give the output of the crash being:Thread 1 "node" received signal SIGSEGV, Segmentation fault.
|
I'm done today and booked for most of tomorrow. @gabrielschulhof I also changed nodejs/node#37802 (review) to "requested changes" so that we don't land until we figure this out. |
More complete stack trace: 000000e7c930 in v8::internal::GlobalHandles::ClearWeakness(unsigned long*) ()
Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64
(gdb) bt
#0 0x0000000000e7c930 in v8::internal::GlobalHandles::ClearWeakness(unsigned long*) ()
#1 0x0000000000aaaaff in v8impl::(anonymous namespace)::Reference::Finalize(bool) ()
#2 0x0000000000ac9c17 in node_napi_env__::~node_napi_env__() ()
#3 0x0000000000a924cb in node::Environment::RunCleanup() ()
#4 0x0000000000a486ac in node::FreeEnvironment(node::Environment*) ()
#5 0x0000000000b3dd8f in node::NodeMainInstance::Run(node::EnvSerializeInfo const*) ()
#6 0x0000000000ac3772 in node::Start(int, char**) ()
#7 0x00007ffff6ce17b3 in __libc_start_main () from /lib64/libc.so.6
#8 0x0000000000a43d9e in _start () I've also confirmed that this line added in nodejs/node#37616 is the one causing the crash by removing it and running the test:
Test
Of note it does not crash every time that is called (I added printfs before/after). What I mean is that running the test there are many calls to ClearWeak and only the last one crashses and it do so every time the test is run. |
Just testing out a fix... Will update once tests complete. |
Refs: nodejs/node-addon-api#906 Refs: nodejs#37616 Fix crash introduced by nodejs#37616 Signed-off-by: Michael Dawson <[email protected]>
PR submitted here: nodejs/node#37876 |
Refs: nodejs/node-addon-api#906 Refs: nodejs#37616 Fix crash introduced by nodejs#37616 Signed-off-by: Michael Dawson <[email protected]>
Refs: nodejs/node-addon-api#906 Refs: nodejs#37616 Fix crash introduced by nodejs#37616 Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs#37876 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: James M Snell <[email protected]>
nodejs/node#37876 fixed the consistent failure, but we are back to an intermittent failure with what seems to be in the threadsafe tests. https://ci.nodejs.org/job/node-test-node-addon-api-new/nodes=osx1015/3828/console Running test 'threadsafe_function/threadsafe_function_existing_tsfn'
Running test 'threadsafe_function/threadsafe_function_ptr'
Running test 'threadsafe_function/threadsafe_function_sum'
Build timed out (after 60 minutes). Marking the build as failed.
Tests aborted with SIGTERM
npm ERR! Test failed. See above for more details.
Build was aborted
Sending e-mails to: [email protected] [email protected] [email protected]
Collecting metadata...
Metadata collection done.
Notifying upstream projects of job completion
Finished: FAILURE I did see the same hang when I ran locally (after 300 runs in one case, but not another). Trying to cut down how long it takes to recreate. Ran just the sum test but no recreate in 1000 runs. Now trying running just the ThreadSafe function tests. |
Refs: nodejs/node-addon-api#906 Refs: nodejs#37616 Fix crash introduced by nodejs#37616 Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs#37876 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: James M Snell <[email protected]>
Refs: nodejs/node-addon-api#906 Refs: nodejs#37616 Fix crash introduced by nodejs#37616 Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs#37876 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: James M Snell <[email protected]>
Just running the threadsafe tests I also did not get a recreate in 1000 runs. |
No failures in the CI in the last 5 runs |
Refs: nodejs/node-addon-api#906 Refs: #37616 Fix crash introduced by #37616 Signed-off-by: Michael Dawson <[email protected]> PR-URL: #37876 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: James M Snell <[email protected]>
and nodejs/node@fc20e83 to 14.x as well |
Still seems to have crashed 4/299 runs on 14.x. Going to swtich back to master to confirm it is better and if so keep looking for what else I might not have pulled over yet. |
@gabrielschulhof, @legendecas do I remember you mentioning some different in gc between 14.x and 16.x that might be related? |
Well may just have been "unlucky" with earlier run using main, looks like 2 crashes/157 so far. Will let it run longer but may be that it is the Buffer finalizer issue we discussed or possibly something else. |
It might also be possible that I needed a make clean to pick up the change when I moved back to master. |
Looking like having done a full make clean/compile its back to not crashing on main, but will let it run longer this time. I may have had the same issue on 14.x as I did not do a full make clean/compile after pulling in some of the changes from main. |
Well 3 crashes now after having run longer. 3/564 runs so was too optimistic before... |
Added an additional change to Ref/Unref for the SetImmediate cases. The Ref/Unref may be occurring elsewhere, but the stack trace still looks like the env is being deleted before it should. |
Ok no crashes in 2064 runs overnight, so the latest change is looking good with master. The changes I tested in master are in: nodejs/node#38492. |
There were 13 instances of an invalid read of 1 bytes in those 2064 runs, but I believe that is a different issue. For those the stack trace looked like:
|
Thjis is the SecondPassCallback method. Looking for where the invalid read might come from, but not obvious: static void SecondPassCallback(
const v8::WeakCallbackInfo<SecondPassCallParameterRef>& data) {
SecondPassCallParameterRef* parameter = data.GetParameter();
Reference* reference = *parameter;
delete parameter;
if (reference == nullptr) {
// the reference itself has already been deleted so nothing to do
return;
}
reference->Finalize();
} |
Can't remember if I have node compiled for debug. Once the runs to see if the crash is resolved on 14.x as well I'll recompile and see if that gives us a more specific pointer to the line where the invalid read is occurring. |
Ok 3560 runs over the last day or so and no crashes. I think that confirms that the changes in nodejs/node#38492 address the issue in main, and along with the other changes we need to backport to 14.x address the issue in 14.x as well. |
Trying to recreate the non-crash 1byte invalid read with a debug version of Node.js to get a better stack trace. No recreates unfortunately in 1252 runs. I'll let it run overnight... If I still can't recreate I'll go back to refactoring as requested in nodejs/node#38492 and trying to create a test. |
Ok so another 4604 runs with the debug version of Node.js over a few days and no recreate of the 1 byte invalid read. At this point I'm going to focus on getting nodejs/node#38492 into a shape where it can land. We should also start working on getting the other PRs from main backported to master. Those include: |
Refs: nodejs/node-addon-api#906 Ensure that finalization is not defered during shutdown. The env for the addon is deleted immediately after iterating the list of finalizers to be run. Defering causes crashes as the finalization uses the already deleted env. Signed-off-by: Michael Dawson <[email protected]> PR-URL: #38492 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Gabriel Schulhof <[email protected]>
Refs: nodejs/node-addon-api#906 Ensure that finalization is not defered during shutdown. The env for the addon is deleted immediately after iterating the list of finalizers to be run. Defering causes crashes as the finalization uses the already deleted env. Signed-off-by: Michael Dawson <[email protected]> PR-URL: #38492 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Gabriel Schulhof <[email protected]>
Refs: nodejs/node-addon-api#906 Ensure that finalization is not defered during shutdown. The env for the addon is deleted immediately after iterating the list of finalizers to be run. Defering causes crashes as the finalization uses the already deleted env. Signed-off-by: Michael Dawson <[email protected]> PR-URL: #38492 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Gabriel Schulhof <[email protected]>
Refs: nodejs/node-addon-api#906 Ensure that finalization is not defered during shutdown. The env for the addon is deleted immediately after iterating the list of finalizers to be run. Defering causes crashes as the finalization uses the already deleted env. Signed-off-by: Michael Dawson <[email protected]> PR-URL: #38492 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Gabriel Schulhof <[email protected]>
Refs: nodejs/node-addon-api#906 Ensure that finalization is not defered during shutdown. The env for the addon is deleted immediately after iterating the list of finalizers to be run. Defering causes crashes as the finalization uses the already deleted env. Signed-off-by: Michael Dawson <[email protected]> PR-URL: #38492 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Gabriel Schulhof <[email protected]>
Refs: nodejs/node-addon-api#906 Ensure that finalization is not defered during shutdown. The env for the addon is deleted immediately after iterating the list of finalizers to be run. Defering causes crashes as the finalization uses the already deleted env. Signed-off-by: Michael Dawson <[email protected]> PR-URL: #38492 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Gabriel Schulhof <[email protected]>
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made. |
Should have closed this issue as the PR to resolve was landed. |
Refs: nodejs/node-addon-api#906 Refs: nodejs#37616 Fix crash introduced by nodejs#37616 Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs#37876 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: James M Snell <[email protected]>
Refs: nodejs/node-addon-api#906 Ensure that finalization is not defered during shutdown. The env for the addon is deleted immediately after iterating the list of finalizers to be run. Defering causes crashes as the finalization uses the already deleted env. Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs#38492 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Gabriel Schulhof <[email protected]>
Refs: nodejs/node-addon-api#906 Refs: #37616 Fix crash introduced by #37616 Signed-off-by: Michael Dawson <[email protected]> PR-URL: #37876 Backport-PR-URL: #42512 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: James M Snell <[email protected]>
Refs: nodejs/node-addon-api#906 Ensure that finalization is not defered during shutdown. The env for the addon is deleted immediately after iterating the list of finalizers to be run. Defering causes crashes as the finalization uses the already deleted env. Signed-off-by: Michael Dawson <[email protected]> PR-URL: #38492 Backport-PR-URL: #42512 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Gabriel Schulhof <[email protected]>
Failure in recent nightly run: @gabrielschulhof, @KevinEady as our threadsafe experts.
https://ci.nodejs.org/job/node-test-node-addon-api-new/nodes=fedora-latest-x64/3589/console
The text was updated successfully, but these errors were encountered: