-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent crashes in CI #1022
Comments
I think I've managed to recreate a similar instance and the stack trace looks like: sing host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/midawson/newpull/land/node/out/Release/node --expose-gc --no-concurrent-a'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000e9ffe1 in v8::internal::GlobalHandles::Destroy(unsigned long*) ()
[Current thread is 1 (Thread 0x7fda22e537c0 (LWP 2218825))]
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-127.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64
(gdb) bt
#0 0x0000000000e9ffe1 in v8::internal::GlobalHandles::Destroy(unsigned long*) ()
#1 0x0000000000ad98dd in napi_resolve_deferred ()
#2 0x00007fda20085ebd in Napi::Promise::Deferred::Resolve (this=0x5935dc0, value=0x5945090) at ../../napi-inl.h:2170
#3 0x00007fda200b272c in (anonymous namespace)::TSFNWrap::<lambda(Napi::Env, Napi::Reference<Napi::Value>*)>::operator()(Napi::Env, Napi::Reference<Napi::Value> *) const (__closure=0x5939dc8, env=..., ctx=0x5a09dc0)
at ../threadsafe_function/threadsafe_function_ctx.cc:51
#4 0x00007fda200b386b in Napi::details::ThreadSafeFinalize<Napi::Reference<Napi::Value>, (anonymous namespace)::TSFNWrap::TSFNWrap(const Napi::CallbackInfo&)::<lambda(Napi::Env, Napi::Reference<Napi::Value>*)>, void>::FinalizeWrapperWithContext(napi_env, void *, void *) (env=0x5979870, rawFinalizeData=0x5939dc0, rawContext=0x5a09dc0) at ../../napi-inl.h:245
#5 0x0000000000aeb97d in node::Environment::CloseHandle<uv_handle_s, v8impl::(anonymous namespace)::ThreadSafeFunction::CloseHandlesAndMaybeDelete(bool)::{lambda(uv_handle_s*)#1}>(uv_handle_s*, v8impl::(anonymous namespace)::ThreadSafeFunction::CloseHandlesAndMaybeDelete(bool)::{lambda(uv_handle_s*)#1})::{lambda(uv_handle_s*)#1}::_FUN(uv_handle_s*) ()
#6 0x00000000015b4155 in uv__finish_close (handle=0x596ed78) at ../deps/uv/src/unix/core.c:303
#7 uv__run_closing_handles (loop=0x487e080 <default_loop_struct>) at ../deps/uv/src/unix/core.c:317
#8 uv_run (loop=0x487e080 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:395
#9 0x0000000000a63ed5 in node::SpinEventLoop(node::Environment*) () at ../deps/uv/src/unix/core.c:854
#10 0x0000000000b64546 in node::NodeMainInstance::Run(node::EnvSerializeInfo const*) () at ../deps/uv/src/unix/core.c:854
#11 0x0000000000ae9152 in node::Start(int, char**) () at ../deps/uv/src/unix/core.c:854
#12 0x00007fda21b4a7b3 in __libc_start_main () from /lib64/libc.so.6
#13 0x0000000000a619ce in _start () at ../deps/uv/src/unix/core.c:854
(gdb) Quit
|
And another one: warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/midawson/newpull/land/node/out/Release/node --expose-gc --no-concurrent-a'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000ea21e8 in v8::internal::GlobalHandles::Create(v8::internal::Object) ()
[Current thread is 1 (Thread 0x7f7c0579a7c0 (LWP 374115))]
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-127.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64
(gdb) bt
#0 0x0000000000ea21e8 in v8::internal::GlobalHandles::Create(v8::internal::Object) ()
#1 0x0000000000d0c2d4 in v8::V8::GlobalizeReference(v8::internal::Isolate*, unsigned long*) ()
#2 0x0000000000aeed03 in napi_create_threadsafe_function ()
#3 0x00007f7c01703ead in Napi::ThreadSafeFunction::New<char const*, Napi::Reference<Napi::Value>, (anonymous namespace)::TSFNWrap::TSFNWrap(const Napi::CallbackInfo&)::<lambda(Napi::Env, Napi::Reference<Napi::Value>*)>, void>(napi_env, const Napi::Function &, const Napi::Object &, const char *, size_t, size_t, Napi::Reference<Napi::Value> *, (anonymous namespace)::TSFNWrap::<lambda(Napi::Env, Napi::Reference<Napi::Value>*)>, void *, napi_finalize) (env=0x5378ef0,
callback=..., resource=..., resourceName=0x7f7c017380c3 "Test", maxQueueSize=1, initialThreadCount=1, context=0x532ffa0, finalizeCallback=..., data=0x0,
wrapper=0x7f7c01703d45 <Napi::details::ThreadSafeFinalize<Napi::Reference<Napi::Value>, (anonymous namespace)::TSFNWrap::TSFNWrap(const Napi::CallbackInfo&)::<lambda(Napi::Env, Napi::Reference<Napi::Value>*)>, void>::FinalizeWrapperWithContext(napi_env, void *, void *)>) at ../../napi-inl.h:5261
#4 0x00007f7c017033b9 in Napi::ThreadSafeFunction::New<char const*, Napi::Reference<Napi::Value>, (anonymous namespace)::TSFNWrap::TSFNWrap(const Napi::CallbackInfo&)::<lambda(Napi::Env, Napi::Reference<Napi::Value>*)> >(napi_env, const Napi::Function &, const Napi::Object &, const char *, size_t, size_t, Napi::Reference<Napi::Value> *, (anonymous namespace)::TSFNWrap::<lambda(Napi::Env, Napi::Reference<Napi::Value>*)>) (env=0x5378ef0, callback=..., resource=...,
resourceName=0x7f7c017380c3 "Test", maxQueueSize=1, initialThreadCount=1, context=0x532ffa0, finalizeCallback=...) at ../../napi-inl.h:5119
#5 0x00007f7c01702e78 in (anonymous namespace)::TSFNWrap::TSFNWrap (this=0x53fc120, info=...) at ../threadsafe_function/threadsafe_function_ctx.cc:54
#6 0x00007f7c01704335 in Napi::ObjectWrap<(anonymous namespace)::TSFNWrap>::<lambda()>::operator()(void) const (this=0x7ffdd8d6e370)
at ../../napi-inl.h:4034
#7 0x00007f7c01704c86 in Napi::details::WrapCallback<Napi::ObjectWrap<T>::ConstructorCallbackWrapper(napi_env, napi_callback_info) [with T = (anonymous namespace)::TSFNWrap]::<lambda()> >(Napi::ObjectWrap<(anonymous namespace)::TSFNWrap>::<lambda()>) (callback=...) at ../../napi-inl.h:82
#8 0x00007f7c0170449f in Napi::ObjectWrap<(anonymous namespace)::TSFNWrap>::ConstructorCallbackWrapper (env=0x5378ef0, info=0x7ffdd8d6e400)
at ../../napi-inl.h:4032
#9 0x0000000000acfdaf in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
#10 0x0000000000d72ac5 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<true>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#11 0x0000000000d730af in v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) ()
#12 0x0000000000d736d6 in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#13 0x0000000001638539 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit ()
#14 0x00000000015ceff8 in Builtins_JSBuiltinsConstructStub () |
Related to the last crash 41 TSFNWrap::TSFNWrap(const CallbackInfo &info) : ObjectWrap<TSFNWrap>(info) {
42 Napi::Env env = info.Env();
43
44 Reference<Napi::Value> *_ctx = new Reference<Napi::Value>;
45 *_ctx = Persistent(info[0]);
46
47 _tsfn = ThreadSafeFunction::New(
48 info.Env(), Function::New(env, [](const CallbackInfo & /*info*/) {}),
49 Object::New(env), "Test", 1, 1, _ctx,
50 [this](Napi::Env env, Reference<Napi::Value> *ctx) {
51 _deferred->Resolve(env.Undefined());
52 ctx->Reset();
53 delete ctx;
54 });
55 }
56
57 } // namespace
|
And #0 0x0000000000ea21e8 in v8::internal::GlobalHandles::Create(v8::internal::Object) ()
[Current thread is 1 (Thread 0x7fd9b2a657c0 (LWP 900741))]
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-127.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64
(gdb) bt
#0 0x0000000000ea21e8 in v8::internal::GlobalHandles::Create(v8::internal::Object) ()
#1 0x0000000000d0c2d4 in v8::V8::GlobalizeReference(v8::internal::Isolate*, unsigned long*) ()
#2 0x0000000000ad96d3 in napi_create_promise ()
#3 0x00007fd9b01a8e41 in Napi::Promise::Deferred::Deferred (this=0x51c1790, env=0x5208ef0) at ../../napi-inl.h:2157
#4 0x00007fd9b01d0adc in (anonymous namespace)::TSFNWrap::Release (this=0x534da80, info=...) at ../threadsafe_function/threadsafe_function_ctx.cc:21
#5 0x00007fd9b01d14ce in Napi::InstanceWrap<(anonymous namespace)::TSFNWrap>::<lambda()>::operator()(void) const (this=0x7fffef3ba650)
at ../../napi-inl.h:3617
#6 0x00007fd9b01d2001 in Napi::details::WrapCallback<Napi::InstanceWrap<T>::InstanceMethodCallbackWrapper(napi_env, napi_callback_info) [with T = (anonymous namespace)::TSFNWrap]::<lambda()> >(Napi::InstanceWrap<(anonymous namespace)::TSFNWrap>::<lambda()>) (callback=...) at ../../napi-inl.h:82
#7 0x00007fd9b01d152b in Napi::InstanceWrap<(anonymous namespace)::TSFNWrap>::InstanceMethodCallbackWrapper (env=0x5208ef0, info=0x7fffef3ba6b0)
at ../../napi-inl.h:3610
#8 0x0000000000acfdaf in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
#9 0x0000000000d71bab in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#10 0x0000000000d7305c in v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) ()
#11 0x0000000000d736d6 in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#12 0x0000000001638539 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit ()
#13 0x00000000015d17eb in Builtins_InterpreterEntryTrampoline ()
|
Going to recompile node in debug mode to see if we get any more info as to what is going on in the GlobalHandles code. |
Of note I am testing with current master, we've seen the similar failure in master. From the CI this is what we have seen lately: |
|
Seems like a debug check fires in the same place
|
I'm wondering if this is related to the issue we see in the ci or specific to main. Going to run with 16.x to see if we get the same debug checks there. |
16.x seems to have less, just 2 in over 10000 runs so far. This is one of the stack traces:
|
I'm not sure but all of the stack traces have GlobalHandles Create and Destroy involved... Is there something wrong with the test's code? Should I create handles? For example, I don't create a handle in Finalize because there wasn't one done in the original Node-API tsfn test https://github.com/nodejs/node/blob/4b0776ab64596d1bbd98a93e3de7cee82a7e8130/test/node-api/test_threadsafe_function/binding.c#L167-L188 But this one is odd:
This one is part of the synchronous stack from JavaScript -> C++ binding
|
I tried running just the one test and 30000 iterations led to no recreates so trying with larger set of tests. |
@KevinEady still trying to narrow down, I agree it's not obvious why some of those failures could/would happen. I'm thinking maybe we are getting memory corruption which then leads to those asserts. Trying to narrow down recreate but seems like it running a single test does not result in the problem. |
Have been working through removing just a subset of tests. Seems like if I remove just objectwrap_worker_thread then I got no failures in 26300 runs. Typically I would have gotten failures long before then. |
objectwrap_worker_thread: 'use strict';
const { Worker, isMainThread, workerData } = require('worker_threads');
if (isMainThread) {
const buildType = process.config.target_defaults.default_configuration;
new Worker(__filename, { workerData: buildType });
} else {
const test = binding => {
new binding.objectwrap.Test();
};
const buildType = workerData;
require('./common').runTest(test, buildType);
} |
Turns out the worker in objectwrap_worker_thread was running in parallel with other tests, not sure if that is related to the crash but have a refactored version that makes sure worker has completed before running the rest of the tests to test out. |
15000 runs with the refactor and looking good. Heading out for a long weekend so I don't want to leave running but I'll kickoff longer run on Tuesday and then follow up with PR if its still looks good. |
Ok so 41595 more successful runs with the change, will submit a PR. There is an unrelated issue I found along the way but will open a different issue fort that. |
fixes: nodejs#1022 The objectwrap_worker_thread and symbol tests were not waiting for the test to complete before the subsequent tests were started. This caused intermitted crashes in the CI. Updated both tests so that they complete before the next test runs. Signed-off-by: Michael Dawson <[email protected]>
Submitted #1024 |
fixes: nodejs#1022 The objectwrap_worker_thread and symbol tests were not waiting for the test to complete before the subsequent tests were started. This caused intermitted crashes in the CI. Updated both tests so that they complete before the next test runs. Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs#1024 Reviewed-By: Chengzhong Wu <[email protected]>
fixes: nodejs#1022 The objectwrap_worker_thread and symbol tests were not waiting for the test to complete before the subsequent tests were started. This caused intermitted crashes in the CI. Updated both tests so that they complete before the next test runs. Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs#1024 Reviewed-By: Chengzhong Wu <[email protected]>
fixes: nodejs/node-addon-api#1022 The objectwrap_worker_thread and symbol tests were not waiting for the test to complete before the subsequent tests were started. This caused intermitted crashes in the CI. Updated both tests so that they complete before the next test runs. Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs/node-addon-api#1024 Reviewed-By: Chengzhong Wu <[email protected]>
fixes: nodejs/node-addon-api#1022 The objectwrap_worker_thread and symbol tests were not waiting for the test to complete before the subsequent tests were started. This caused intermitted crashes in the CI. Updated both tests so that they complete before the next test runs. Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs/node-addon-api#1024 Reviewed-By: Chengzhong Wu <[email protected]>
fixes: nodejs/node-addon-api#1022 The objectwrap_worker_thread and symbol tests were not waiting for the test to complete before the subsequent tests were started. This caused intermitted crashes in the CI. Updated both tests so that they complete before the next test runs. Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs/node-addon-api#1024 Reviewed-By: Chengzhong Wu <[email protected]>
fixes: nodejs/node-addon-api#1022 The objectwrap_worker_thread and symbol tests were not waiting for the test to complete before the subsequent tests were started. This caused intermitted crashes in the CI. Updated both tests so that they complete before the next test runs. Signed-off-by: Michael Dawson <[email protected]> PR-URL: nodejs/node-addon-api#1024 Reviewed-By: Chengzhong Wu <[email protected]>
There have been a few recent crashes (maybe 3 in last 15 or so days) in the CI https://ci.nodejs.org/view/x%20-%20Abi%20stable%20module%20API/job/node-test-node-addon-api-LTS%20versions/
A number of the failures look like the following just before the crash
The text was updated successfully, but these errors were encountered: