Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failure in ThreadSafe function #906

Closed
Tracked by #907
mhdawson opened this issue Feb 12, 2021 · 75 comments
Closed
Tracked by #907

Intermittent failure in ThreadSafe function #906

mhdawson opened this issue Feb 12, 2021 · 75 comments
Labels

Comments

@mhdawson
Copy link
Member

Failure in recent nightly run: @gabrielschulhof, @KevinEady as our threadsafe experts.

https://ci.nodejs.org/job/node-test-node-addon-api-new/nodes=fedora-latest-x64/3589/console

Running test 'threadsafe_function/threadsafe_function'


#
# Fatal error in , line 0
# Check failed: !object.IsSmi().
#
#
#
#FailureMessage Object: 0x7ffcf7a31110
 1: 0xae7a01  [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
 2: 0x1b043b4 V8_Fatal(char const*, ...) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
 3: 0xea7e92 v8::internal::PagedSpace::AllocateRawSlow(int, v8::internal::AllocationAlignment, v8::internal::AllocationOrigin) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
 4: 0xe3d8a6 v8::internal::Heap::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
 5: 0xe455c8 v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
 6: 0xe0925a v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
 7: 0xe04c72 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
 8: 0xe04d24 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
 9: 0x1070bae v8::internal::Handle<v8::internal::FixedArray> v8::internal::String::CalculateLineEnds<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>, bool) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
10: 0x1031483 void v8::internal::Script::InitLineEnds<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Script>) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
11: 0x104554b v8::internal::Script::GetLineNumber(v8::internal::Handle<v8::internal::Script>, int) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
12: 0xdd5fc3 v8::internal::JSStackFrame::GetLineNumber() [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
13: 0xe13623 v8::internal::Factory::NewStackFrameInfo(v8::internal::Handle<v8::internal::FrameArray>, int) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
14: 0x1060928 v8::internal::StackTraceFrame::InitializeFrameInfo(v8::internal::Handle<v8::internal::StackTraceFrame>) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
15: 0x1060b00 v8::internal::StackTraceFrame::GetFrameInfo(v8::internal::Handle<v8::internal::StackTraceFrame>) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
16: 0x1063c85 v8::internal::SerializeStackTraceFrame(v8::internal::Isolate*, v8::internal::Handle<v8::internal::StackTraceFrame>, v8::internal::IncrementalStringBuilder*) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
17: 0x106480c v8::internal::SerializeStackTraceFrame(v8::internal::Isolate*, v8::internal::Handle<v8::internal::StackTraceFrame>) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
18: 0xcdb3d4  [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
19: 0xce32f6 v8::internal::Builtin_CallSitePrototypeToString(int, unsigned long*, v8::internal::Isolate*) [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
20: 0x1505f79  [/home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-v16.0.0-nightly20210209beee53884d-linux-x64/bin/node]
Tests aborted with SIGILL
npm ERR! code 1
npm ERR! path /home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/node-addon-api
npm ERR! command failed
npm ERR! command sh -c node test

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/iojs/build/workspace/node-test-node-addon-api-new/nodes/fedora-latest-x64/npm-cache/_logs/2021-02-10T09_48_16_307Z-debug.log
Build step 'Conditional steps (multiple)' marked build as failure
Sending e-mails to: [email protected] [email protected] [email protected]
Collecting metadata...
Metadata collection done.
Notifying upstream projects of job completion
Finished: FAILURE
@mhdawson
Copy link
Member Author

mhdawson commented Feb 12, 2021

Also seeing occasional failure like this one: https://ci.nodejs.org/job/node-test-node-addon-api-new/nodes=ubuntu1604-64/3564/console

Running test 'threadsafe_function/threadsafe_function'
Running test 'threadsafe_function/threadsafe_function_ctx'
Running test 'threadsafe_function/threadsafe_function_existing_tsfn'
Running test 'threadsafe_function/threadsafe_function_ptr'
Running test 'threadsafe_function/threadsafe_function_sum'
Running test 'threadsafe_function/threadsafe_function_unref'
Running test 'typed_threadsafe_function/typed_threadsafe_function'
Running test 'typed_threadsafe_function/typed_threadsafe_function_ctx'
Running test 'typed_threadsafe_function/typed_threadsafe_function_existing_tsfn'
Running test 'typed_threadsafe_function/typed_threadsafe_function_ptr'
Running test 'typed_threadsafe_function/typed_threadsafe_function_sum'
Running test 'typed_threadsafe_function/typed_threadsafe_function_unref'
Running test 'typedarray-bigint'
Running test 'typedarray'
Running test 'version_management'

All tests passed!
Tests aborted with SIGSEGV
npm ERR! code 1
npm ERR! path /home/iojs/build/workspace/node-test-node-addon-api-new/nodes/ubuntu1604-64/node-addon-api
npm ERR! command failed
npm ERR! command sh -c node test

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/iojs/build/workspace/node-test-node-addon-api-new/nodes/ubuntu1604-64/npm-cache/_logs/2021-02-06T09_34_12_368Z-debug.log
Build step 'Conditional steps (multiple)' marked build as failure
Sending e-mails to: [email protected] [email protected] [email protected]
Collecting metadata...
Metadata collection done.
Notifying upstream projects of job completion
Finished: FAILURE

@mhdawson mhdawson mentioned this issue Feb 12, 2021
1 task
@gabrielschulhof
Copy link
Contributor

@mhdawson this looks like it's against Node.js nightly. Might it be a regression in core?

@gabrielschulhof
Copy link
Contributor

The stack doesn't seem to contain any of our frames – unless I'm glossing over some.

@gabrielschulhof
Copy link
Contributor

I built against nodejs/node@1c6484b and got

Running test 'threadsafe_function/threadsafe_function'
/home/nix/node/node/out/Release/node[3813]: ../src/api/callback.cc:121:void node::InternalCallbackScope::Close(): Assertion `(env_->execution_async_id()) == (0)' failed.
 1: 0xabc3c0 node::Abort() [/home/nix/node/node/out/Release/node]
 2: 0xabc43e  [/home/nix/node/node/out/Release/node]
 3: 0x9fcfe8 node::InternalCallbackScope::Close() [/home/nix/node/node/out/Release/node]
 4: 0x9fd3a3 node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) [/home/nix/node/node/out/Release/node]
 5: 0x9fd52c node::MakeCallback(v8::Isolate*, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) [/home/nix/node/node/out/Release/node]
 6: 0xa3dd13 node::Environment::CheckImmediate(uv_check_s*) [/home/nix/node/node/out/Release/node]
 7: 0x155f1b9  [/home/nix/node/node/out/Release/node]
 8: 0x15578b4 uv_run [/home/nix/node/node/out/Release/node]
 9: 0x9fe64e node::SpinEventLoop(node::Environment*) [/home/nix/node/node/out/Release/node]
10: 0xb04537 node::NodeMainInstance::Run(node::EnvSerializeInfo const*) [/home/nix/node/node/out/Release/node]
11: 0xa7fc50 node::Start(int, char**) [/home/nix/node/node/out/Release/node]
12: 0x7fea27c82f43 __libc_start_main [/lib64/libc.so.6]
13: 0x9faaee _start [/home/nix/node/node/out/Release/node]
Tests aborted with SIGABRT

@gabrielschulhof
Copy link
Contributor

Hmmm ... I can't reproduce that crash.

@gabrielschulhof
Copy link
Contributor

Trying to reproduce inside an Ubuntu 16.04 container.

@mhdawson
Copy link
Member Author

The crash on shutdown with master ow seems to recreate consistently:

All tests passed!

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x0000000000e7acb0 in v8::internal::GlobalHandles::ClearWeakness(unsigned long*) ()
Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64
(gdb) 

@gabrielschulhof I'm wondering if that might be related to the change from nodejs/node#37616. Any chance you still have the environment you had for that PR to compare before/after the change?

@mhdawson
Copy link
Member Author

Cutting down the test case this is the one that is failing:

Starting program: /home/midawson/node/node-v16.0.0-nightly202103215318e53fd8-linux-x64/bin/node --expose-gc --no-concurrent-array-buffer-sweeping /home/midawson/newpull/land/node-addon-api/test/objectreference.js
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-127.el8.x86_64
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6cbd700 (LWP 334846)]
[New Thread 0x7ffff64bc700 (LWP 334847)]
[New Thread 0x7ffff5cbb700 (LWP 334848)]
[New Thread 0x7ffff54ba700 (LWP 334849)]
[New Thread 0x7ffff4cb9700 (LWP 334850)]
[New Thread 0x7ffff7ff6700 (LWP 334851)]

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x0000000000e7acb0 in v8::internal::GlobalHandles::ClearWeakness(unsigned long*) ()
Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64
(gdb) 

@mhdawson
Copy link
Member Author

Cut down test case to these 2 and it still crashes:

  () => {
      binding.objectreference.setObjects("hello", "world", "javascript");
      const test = binding.objectreference.getFromValue();
      const test2 = binding.objectreference.getFromValue("hello");

      assert.deepEqual({ hello: "world" }, test);
      assert.deepEqual({ hello: "world" }, test2);
      assert.deepEqual(test, test2);
    },
    () => {
      binding.objectreference.setObjects(1, "hello world");
      const test = binding.objectreference.getFromValue();
      const test2 = binding.objectreference.getFromGetter(1);

      assert.deepEqual({ 1: "hello world"}, test);
      assert.equal("hello world", test2);
      assert.equal(test[1], test2);
    },

@mhdawson
Copy link
Member Author

But not with just one of the 2

@mhdawson
Copy link
Member Author

Just these 2 also seems to recreate

() => {
      binding.objectreference.setObjects(1, "hello world");
      const test = binding.objectreference.getFromValue();
      const test2 = binding.objectreference.getFromGetter(1);

      assert.deepEqual({ 1: "hello world"}, test);
      assert.equal("hello world", test2);
      assert.equal(test[1], test2);
    },

    () => {
      binding.objectreference.setObjects(0, "hello");
      binding.objectreference.setObjects(1, "world");
      const test = binding.objectreference.getFromValue();
      const test2 = binding.objectreference.getFromGetter(0);
      const test3 = binding.objectreference.getFromGetter(1);

      assert.deepEqual({ 1: "world"}, test);
      assert.equal(undefined, test2);
      assert.equal("world", test3);
    },

@mhdawson
Copy link
Member Author

May require 2 tests so that a gc is forced after one of the tests complete ?

@mhdawson
Copy link
Member Author

Cutting down the tests to just:

// info[0] is the key, which can be either a string or a number.
// info[1] is the value.
// info[2] is a flag that differentiates whether the key is a
// C string or a JavaScript string.
void SetObjects(const CallbackInfo& info) {
  Env env = info.Env();
  HandleScope scope(env);

 weak = Weak(Object::New(env));
 weak.SuppressDestruct();

 /* persistent = Persistent(Object::New(env));
  persistent.SuppressDestruct();

  reference = Reference<Object>::New(Object::New(env), 2);
  reference.SuppressDestruct();
  */

  if (info[0].IsString()) {
    if (info[2].As<String>() == String::New(env, "javascript")) {
//      weak.Set(info[0].As<String>(), info[1]);
//      persistent.Set(info[0].As<String>(), info[1]);
//      reference.Set(info[0].As<String>(), info[1]);
    } else {
      weak.Set(info[0].As<String>().Utf8Value(), info[1]);
      persistent.Set(info[0].As<String>().Utf8Value(), info[1]);
      reference.Set(info[0].As<String>().Utf8Value(), info[1]);
    }
  } else if (info[0].IsNumber()) {
    weak.Set(info[0].As<Number>(), info[1]);
    persistent.Set(info[0].As<Number>(), info[1]);
    reference.Set(info[0].As<Number>(), info[1]);
  }
}
    () => {
      binding.objectreference.setObjects("hello", "world", "javascript");
 //     const test = binding.objectreference.getFromValue();
 //     const test2 = binding.objectreference.getFromValue("hello");

   //   assert.deepEqual({ hello: "world" }, test);
//      assert.deepEqual({ hello: "world" }, test2);
  //    assert.deepEqual(test, test2);
    },
    () => {
//      binding.objectreference.setObjects(1, "hello world");
//      const test = binding.objectreference.getFromValue();
//      const test2 = binding.objectreference.getFromGetter(1);

//      assert.deepEqual({ 1: "hello world"}, test);
//      assert.equal("hello world", test2);
//      assert.equal(test[1], test2);
    },

still recreates

So it seems like just creating a Weak causes it? The strange thing is that I would have thought the C node-api tests would have caught an issue like that.

@mhdawson
Copy link
Member Author

I've got to think its related to nodejs/node#37616, give the output of the crash being:Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x0000000000e7acb0 in v8::internal::GlobalHandles::ClearWeakness(unsigned long*) ()


@mhdawson
Copy link
Member Author

I'm done today and booked for most of tomorrow. @gabrielschulhof I also changed nodejs/node#37802 (review) to "requested changes" so that we don't land until we figure this out.

@mhdawson
Copy link
Member Author

mhdawson commented Mar 23, 2021

More complete stack trace:

000000e7c930 in v8::internal::GlobalHandles::ClearWeakness(unsigned long*) ()
Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64
(gdb) bt
#0  0x0000000000e7c930 in v8::internal::GlobalHandles::ClearWeakness(unsigned long*) ()
#1  0x0000000000aaaaff in v8impl::(anonymous namespace)::Reference::Finalize(bool) ()
#2  0x0000000000ac9c17 in node_napi_env__::~node_napi_env__() ()
#3  0x0000000000a924cb in node::Environment::RunCleanup() ()
#4  0x0000000000a486ac in node::FreeEnvironment(node::Environment*) ()
#5  0x0000000000b3dd8f in node::NodeMainInstance::Run(node::EnvSerializeInfo const*) ()
#6  0x0000000000ac3772 in node::Start(int, char**) ()
#7  0x00007ffff6ce17b3 in __libc_start_main () from /lib64/libc.so.6
#8  0x0000000000a43d9e in _start ()

I've also confirmed that this line added in nodejs/node#37616 is the one causing the crash by removing it and running the test:

if (is_env_teardown) _persistent.ClearWeak();

Test

node --expose-gc  test/objectreference.js

Of note it does not crash every time that is called (I added printfs before/after). What I mean is that running the test there are many calls to ClearWeak and only the last one crashses and it do so every time the test is run.

@mhdawson
Copy link
Member Author

Just testing out a fix... Will update once tests complete.

mhdawson added a commit to mhdawson/io.js that referenced this issue Mar 23, 2021
Refs: nodejs/node-addon-api#906
Refs: nodejs#37616

Fix crash introduced by nodejs#37616

Signed-off-by: Michael Dawson <[email protected]>
@mhdawson
Copy link
Member Author

PR submitted here: nodejs/node#37876

Trott pushed a commit to Trott/io.js that referenced this issue Mar 24, 2021
Refs: nodejs/node-addon-api#906
Refs: nodejs#37616

Fix crash introduced by nodejs#37616

Signed-off-by: Michael Dawson <[email protected]>
Trott pushed a commit to mhdawson/io.js that referenced this issue Mar 24, 2021
Refs: nodejs/node-addon-api#906
Refs: nodejs#37616

Fix crash introduced by nodejs#37616

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: nodejs#37876
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@mhdawson
Copy link
Member Author

nodejs/node#37876 fixed the consistent failure, but we are back to an intermittent failure with what seems to be in the threadsafe tests.

https://ci.nodejs.org/job/node-test-node-addon-api-new/nodes=osx1015/3828/console

Running test 'threadsafe_function/threadsafe_function_existing_tsfn'
Running test 'threadsafe_function/threadsafe_function_ptr'
Running test 'threadsafe_function/threadsafe_function_sum'
Build timed out (after 60 minutes). Marking the build as failed.
Tests aborted with SIGTERM
npm ERR! Test failed.  See above for more details.
Build was aborted
Sending e-mails to: [email protected] [email protected] [email protected]
Collecting metadata...
Metadata collection done.
Notifying upstream projects of job completion
Finished: FAILURE

I did see the same hang when I ran locally (after 300 runs in one case, but not another).

Trying to cut down how long it takes to recreate. Ran just the sum test but no recreate in 1000 runs. Now trying running just the ThreadSafe function tests.

gabrielschulhof pushed a commit to gabrielschulhof/node that referenced this issue Mar 26, 2021
Refs: nodejs/node-addon-api#906
Refs: nodejs#37616

Fix crash introduced by nodejs#37616

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: nodejs#37876
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
gabrielschulhof pushed a commit to gabrielschulhof/node that referenced this issue Mar 26, 2021
Refs: nodejs/node-addon-api#906
Refs: nodejs#37616

Fix crash introduced by nodejs#37616

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: nodejs#37876
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@mhdawson
Copy link
Member Author

Just running the threadsafe tests I also did not get a recreate in 1000 runs.

@mhdawson
Copy link
Member Author

No failures in the CI in the last 5 runs

ruyadorno pushed a commit to nodejs/node that referenced this issue Mar 29, 2021
Refs: nodejs/node-addon-api#906
Refs: #37616

Fix crash introduced by #37616

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #37876
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@mhdawson
Copy link
Member Author

mhdawson commented May 1, 2021

and nodejs/node@fc20e83 to 14.x as well

@mhdawson
Copy link
Member Author

mhdawson commented May 1, 2021

Still seems to have crashed 4/299 runs on 14.x. Going to swtich back to master to confirm it is better and if so keep looking for what else I might not have pulled over yet.

@mhdawson
Copy link
Member Author

mhdawson commented May 1, 2021

@gabrielschulhof, @legendecas do I remember you mentioning some different in gc between 14.x and 16.x that might be related?

@mhdawson
Copy link
Member Author

mhdawson commented May 1, 2021

Well may just have been "unlucky" with earlier run using main, looks like 2 crashes/157 so far. Will let it run longer but may be that it is the Buffer finalizer issue we discussed or possibly something else.

@mhdawson
Copy link
Member Author

mhdawson commented May 1, 2021

It might also be possible that I needed a make clean to pick up the change when I moved back to master.

@mhdawson
Copy link
Member Author

mhdawson commented May 1, 2021

Looking like having done a full make clean/compile its back to not crashing on main, but will let it run longer this time. I may have had the same issue on 14.x as I did not do a full make clean/compile after pulling in some of the changes from main.

@mhdawson
Copy link
Member Author

mhdawson commented May 1, 2021

Well 3 crashes now after having run longer. 3/564 runs so was too optimistic before...

@mhdawson
Copy link
Member Author

mhdawson commented May 2, 2021

Added an additional change to Ref/Unref for the SetImmediate cases. The Ref/Unref may be occurring elsewhere, but the stack trace still looks like the env is being deleted before it should.

@mhdawson
Copy link
Member Author

mhdawson commented May 2, 2021

Ok no crashes in 2064 runs overnight, so the latest change is looking good with master. The changes I tested in master are in: nodejs/node#38492.

@mhdawson
Copy link
Member Author

mhdawson commented May 2, 2021

There were 13 instances of an invalid read of 1 bytes in those 2064 runs, but I believe that is a different issue.

For those the stack trace looked like:

All tests passed!
==3333742== Invalid read of size 1
==3333742==    at 0xAC3B3B: v8impl::(anonymous namespace)::Reference::SecondPassCallback(v8::WeakCallbackInfo<v8impl::(anonymous namespace)::Reference*> const&) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xE99E5F: v8::internal::GlobalHandles::InvokeSecondPassPhantomCallbacks() [clone .part.129] (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xE9A9D7: v8::internal::GlobalHandles::InvokeSecondPassPhantomCallbacksFromTask() (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xB821D3: node::PerIsolatePlatformData::RunForegroundTask(std::unique_ptr<v8::Task, std::default_delete<v8::Task> >) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xB84048: node::PerIsolatePlatformData::FlushForegroundTasksInternal() (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0x15833D5: uv__async_io.part.1 (async.c:163)
==3333742==    by 0x1596155: uv__io_poll (linux-core.c:462)
==3333742==    by 0x1583D07: uv_run (core.c:385)
==3333742==    by 0xA9E00F: node::Environment::CleanupHandles() (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xAAB1F9: node::Environment::RunCleanup() (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xA5A73B: node::FreeEnvironment(node::Environment*) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xB5778E: node::NodeMainInstance::Run(node::EnvSerializeInfo const*) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==  Address 0x18388451 is 65 bytes inside a block of size 88 free'd
==3333742==    at 0x54777D6: operator delete(void*, unsigned long) (vg_replace_malloc.c:593)
==3333742==    by 0xAC3904: v8impl::(anonymous namespace)::Reference::Finalize(bool) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xAE323E: node_napi_env__::~node_napi_env__() (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xAAB14A: node::Environment::RunCleanup() (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xA5A73B: node::FreeEnvironment(node::Environment*) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xB5778E: node::NodeMainInstance::Run(node::EnvSerializeInfo const*) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xADCC01: node::Start(int, char**) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0x63F97B2: (below main) (in /usr/lib64/libc-2.28.so)
==3333742==  Block was alloc'd at
==3333742==    at 0x5476586: operator new(unsigned long) (vg_replace_malloc.c:342)
==3333742==    by 0xACA273: napi_wrap (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xA81AF76: Napi::ObjectWrap<(anonymous namespace)::TSFNWrap>::ObjectWrap(Napi::CallbackInfo const&) (napi-inl.h:3608)
==3333742==    by 0xA81AB91: (anonymous namespace)::TSFNWrap::TSFNWrap(Napi::CallbackInfo const&) (threadsafe_function_ctx.cc:41)
==3333742==    by 0xA81C2A2: Napi::ObjectWrap<(anonymous namespace)::TSFNWrap>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#1}::operator()() const (napi-inl.h:3967)
==3333742==    by 0xA81CCE4: napi_value__* Napi::details::WrapCallback<Napi::ObjectWrap<(anonymous namespace)::TSFNWrap>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#1}>(Napi::ObjectWrap<(anonymous namespace)::TSFNWrap>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#1}) (napi-inl.h:74)
==3333742==    by 0xA81C3B6: Napi::ObjectWrap<(anonymous namespace)::TSFNWrap>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info__*) (napi-inl.h:3965)
==3333742==    by 0xAC358E: v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xD64544: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<true>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xD64B2E: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0xD65155: v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==    by 0x1606B78: Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit (in /home/midawson/newpull/land/node/out/Release/node)
==3333742==
==3333742==

@mhdawson
Copy link
Member Author

mhdawson commented May 2, 2021

Thjis is the SecondPassCallback method. Looking for where the invalid read might come from, but not obvious:

 static void SecondPassCallback(
      const v8::WeakCallbackInfo<SecondPassCallParameterRef>& data) {
    SecondPassCallParameterRef* parameter = data.GetParameter();
    Reference* reference = *parameter;
    delete parameter;
    if (reference == nullptr) {
      // the reference itself has already been deleted so nothing to do
      return;
    }
    reference->Finalize();
  }

@mhdawson
Copy link
Member Author

mhdawson commented May 2, 2021

Can't remember if I have node compiled for debug. Once the runs to see if the crash is resolved on 14.x as well I'll recompile and see if that gives us a more specific pointer to the line where the invalid read is occurring.

@mhdawson
Copy link
Member Author

mhdawson commented May 3, 2021

Ok 3560 runs over the last day or so and no crashes. I think that confirms that the changes in nodejs/node#38492 address the issue in main, and along with the other changes we need to backport to 14.x address the issue in 14.x as well.

@mhdawson
Copy link
Member Author

mhdawson commented May 3, 2021

Trying to recreate the non-crash 1byte invalid read with a debug version of Node.js to get a better stack trace. No recreates unfortunately in 1252 runs. I'll let it run overnight...

If I still can't recreate I'll go back to refactoring as requested in nodejs/node#38492 and trying to create a test.

@mhdawson
Copy link
Member Author

mhdawson commented May 4, 2021

Ok so another 4604 runs with the debug version of Node.js over a few days and no recreate of the 1 byte invalid read. At this point I'm going to focus on getting nodejs/node#38492 into a shape where it can land.

We should also start working on getting the other PRs from main backported to master. Those include:

mhdawson added a commit to nodejs/node that referenced this issue May 20, 2021
Refs: nodejs/node-addon-api#906

Ensure that finalization is not defered during shutdown.
The env for the addon is deleted immediately after
iterating the list of finalizers to be run. Defering
causes crashes as the finalization uses the already
deleted env.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #38492
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
targos pushed a commit to nodejs/node that referenced this issue May 30, 2021
Refs: nodejs/node-addon-api#906

Ensure that finalization is not defered during shutdown.
The env for the addon is deleted immediately after
iterating the list of finalizers to be run. Defering
causes crashes as the finalization uses the already
deleted env.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #38492
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
danielleadams pushed a commit to nodejs/node that referenced this issue May 31, 2021
Refs: nodejs/node-addon-api#906

Ensure that finalization is not defered during shutdown.
The env for the addon is deleted immediately after
iterating the list of finalizers to be run. Defering
causes crashes as the finalization uses the already
deleted env.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #38492
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
targos pushed a commit to nodejs/node that referenced this issue Jun 5, 2021
Refs: nodejs/node-addon-api#906

Ensure that finalization is not defered during shutdown.
The env for the addon is deleted immediately after
iterating the list of finalizers to be run. Defering
causes crashes as the finalization uses the already
deleted env.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #38492
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
targos pushed a commit to nodejs/node that referenced this issue Jun 5, 2021
Refs: nodejs/node-addon-api#906

Ensure that finalization is not defered during shutdown.
The env for the addon is deleted immediately after
iterating the list of finalizers to be run. Defering
causes crashes as the finalization uses the already
deleted env.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #38492
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
targos pushed a commit to nodejs/node that referenced this issue Jun 11, 2021
Refs: nodejs/node-addon-api#906

Ensure that finalization is not defered during shutdown.
The env for the addon is deleted immediately after
iterating the list of finalizers to be run. Defering
causes crashes as the finalization uses the already
deleted env.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #38492
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2021

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Aug 3, 2021
@mhdawson
Copy link
Member Author

mhdawson commented Aug 3, 2021

Should have closed this issue as the PR to resolve was landed.

@mhdawson mhdawson closed this as completed Aug 3, 2021
legendecas pushed a commit to legendecas/node that referenced this issue Mar 29, 2022
Refs: nodejs/node-addon-api#906
Refs: nodejs#37616

Fix crash introduced by nodejs#37616

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: nodejs#37876
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
legendecas pushed a commit to legendecas/node that referenced this issue Mar 29, 2022
Refs: nodejs/node-addon-api#906

Ensure that finalization is not defered during shutdown.
The env for the addon is deleted immediately after
iterating the list of finalizers to be run. Defering
causes crashes as the finalization uses the already
deleted env.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: nodejs#38492
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
richardlau pushed a commit to nodejs/node that referenced this issue Mar 30, 2022
Refs: nodejs/node-addon-api#906
Refs: #37616

Fix crash introduced by #37616

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #37876
Backport-PR-URL: #42512
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
richardlau pushed a commit to nodejs/node that referenced this issue Mar 30, 2022
Refs: nodejs/node-addon-api#906

Ensure that finalization is not defered during shutdown.
The env for the addon is deleted immediately after
iterating the list of finalizers to be run. Defering
causes crashes as the finalization uses the already
deleted env.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #38492
Backport-PR-URL: #42512
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants