Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: expose node.http trace flag by array #48142

Closed
wants to merge 2 commits into from

Conversation

H4ad
Copy link
Member

@H4ad H4ad commented May 23, 2023

This PR aims to fix the issue raised on NodeJS Performance: nodejs/performance#81

The function isTraceHTTPEnable is a little bit expensive, in general, that function should not even appear in the stack trace since is just to check if the trace is enabled.

The solution was inspired by nodejs/performance#81 (comment) and it basically exposes the C++ pointer of the flag that tells if the HTTP trace is enabled or not by using an array.

To Reviewers

Some interesting files that could be looked to review this code:

  • Built-in Trace Function:
    BUILTIN(IsTraceCategoryEnabled) {
    HandleScope scope(isolate);
    Handle<Object> category = args.atOrUndefined(isolate, 1);
    if (!category->IsString()) {
    THROW_NEW_ERROR_RETURN_FAILURE(
    isolate, NewTypeError(MessageTemplate::kTraceEventCategoryError));
    }
    bool enabled;
    #if defined(V8_USE_PERFETTO)
    MaybeUtf8 category_str(isolate, Handle<String>::cast(category));
    perfetto::DynamicCategory dynamic_category{*category_str};
    enabled = TRACE_EVENT_CATEGORY_ENABLED(dynamic_category);
    #else
    enabled = *GetCategoryGroupEnabled(isolate, Handle<String>::cast(category));
    #endif
    return isolate->heap()->ToBoolean(enabled);
    }
  • The implementation of GetCategoryGroupEnabled:
    const uint8_t* TracingController::GetCategoryGroupEnabled(
    const char* category_group) {
    // Check that category group does not contain double quote
    DCHECK(!strchr(category_group, '"'));
    // The g_category_groups is append only, avoid using a lock for the fast path.
    size_t category_index = base::Acquire_Load(&g_category_index);
    // Search for pre-existing category group.
    for (size_t i = 0; i < category_index; ++i) {
    if (strcmp(g_category_groups[i], category_group) == 0) {
    return &g_category_group_enabled[i];
    }
    }
    // Slow path. Grab the lock.
    base::MutexGuard lock(mutex_.get());
    // Check the list again with lock in hand.
    unsigned char* category_group_enabled = nullptr;
    category_index = base::Acquire_Load(&g_category_index);
    for (size_t i = 0; i < category_index; ++i) {
    if (strcmp(g_category_groups[i], category_group) == 0) {
    return &g_category_group_enabled[i];
    }
    }
    // Create a new category group.
    // Check that there is a slot for the new category_group.
    DCHECK(category_index < kMaxCategoryGroups);
    if (category_index < kMaxCategoryGroups) {
    // Don't hold on to the category_group pointer, so that we can create
    // category groups with strings not known at compile time (this is
    // required by SetWatchEvent).
    const char* new_group = base::Strdup(category_group);
    g_category_groups[category_index] = new_group;
    DCHECK(!g_category_group_enabled[category_index]);
    // Note that if both included and excluded patterns in the
    // TraceConfig are empty, we exclude nothing,
    // thereby enabling this category group.
    UpdateCategoryGroupEnabledFlag(category_index);
    category_group_enabled = &g_category_group_enabled[category_index];
    // Update the max index now.
    base::Release_Store(&g_category_index, category_index + 1);
    } else {
    category_group_enabled =
    &g_category_group_enabled[g_category_categories_exhausted];
    }
    return category_group_enabled;
    }
  • The defined function used to get the current category pointer:
    // Get a pointer to the enabled state of the given trace category. Only
    // long-lived literal strings should be given as the category group. The
    // returned pointer can be held permanently in a local static for example. If
    // the unsigned char is non-zero, tracing is enabled. If tracing is enabled,
    // TRACE_EVENT_API_ADD_TRACE_EVENT can be called. It's OK if tracing is disabled
    // between the load of the tracing state and the call to
    // TRACE_EVENT_API_ADD_TRACE_EVENT, because this flag only provides an early out
    // for best performance when tracing is disabled.
    // const uint8_t*
    // TRACE_EVENT_API_GET_CATEGORY_GROUP_ENABLED(const char* category_group)
    #define TRACE_EVENT_API_GET_CATEGORY_GROUP_ENABLED \
    node::tracing::TraceEventHelper::GetCategoryGroupEnabled

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels May 23, 2023
@H4ad
Copy link
Member Author

H4ad commented May 23, 2023

Some comments about the current implementation:

I create another file called node_tracing.cc and node_tracing.h because I could not make it compile by putting that code inside node_trace_events.cc.

Also, the current code is not compiled due to the following error:

../../src/node_tracing.cc: In constructor ‘node::tracing::BindingData::BindingData(node::Realm*, v8::Local<v8::Object>)’:
../../src/node_tracing.cc:45:35: error: invalid conversion from ‘uint8_t*’ {aka ‘unsigned char*’} to ‘unsigned char’ [-fpermissive]
   45 |   tracing_categories_buffer_[0] = const_cast<uint8_t*>(TRACE_EVENT_API_GET_CATEGORY_GROUP_ENABLED("node.http"));
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                   |
      |                                   uint8_t* {aka unsigned char*}

This makes sense since the tracing_categories_buffer_ expects just a uint8 value instead of a pointer to uint8.
But I cannot figure out which structure I could use in this place to be able to expose the uint8 pointer to be used in JS land.

So, @anonrig, @RafaelGSS or @bnoordhuis have some hint of what I should do?

@bnoordhuis
Copy link
Member

Try creating the ArrayBuffer directly from a custom BackingStore:

auto p = const_cast<uint8*>(TRACE_EVENT_API_GET_CATEGORY_GROUP_ENABLED("node.http"));
auto nop = [](void*, size_t, void*) {}; // no-op deleter
auto bs = v8::ArrayBuffer::NewBackingStore(p, 1, nop, nullptr);
auto ab = v8::ArrayBuffer::New(isolate, bs);
auto u8 = v8::Uint8Array::New(ab, 0, 1);

@H4ad
Copy link
Member Author

H4ad commented May 23, 2023

@bnoordhuis I needed to modify a little bit your code in order to compile, but now I cannot convert the Uint8Array to AliasedUint8Array:

  auto p = const_cast<uint8_t*>(TRACE_EVENT_API_GET_CATEGORY_GROUP_ENABLED("node.http"));
  auto nop = [](void*, size_t, void*) {}; // no-op deleter
  auto bs = v8::ArrayBuffer::NewBackingStore(p, 1, nop, nullptr);
  auto ab = v8::ArrayBuffer::New(realm->isolate(), std::move(bs));
  auto u8 = v8::Uint8Array::New(ab, 0, 1);

  tracing_categories_buffer_ = AliasedUint8Array(realm->isolate(), 0, 1, u8, nullptr);

Message:

../../src/node_tracing.cc: In constructor ‘node::tracing::BindingData::BindingData(node::Realm*, v8::Local<v8::Object>)’:
../../src/node_tracing.cc:52:85: error: no matching function for call to ‘node::AliasedBufferBase<unsigned char, v8::Uint8Array>::AliasedBufferBase(v8::Isolate*, int, int, v8::Local<v8::Uint8Array>&, std::nullptr_t)’
   52 |   tracing_categories_buffer_ = AliasedUint8Array(realm->isolate(), 0, 1, u8, nullptr);
      |                                                                                     ^
In file included from ../../src/node_tracing.h:5,
                 from ../../src/node_tracing.cc:1:
../../src/aliased_buffer.h:56:3: note: candidate: ‘node::AliasedBufferBase<NativeT, V8T>::AliasedBufferBase(const node::AliasedBufferBase<NativeT, V8T>&) [with NativeT = unsigned char; V8T = v8::Uint8Array]’
   56 |   AliasedBufferBase(const AliasedBufferBase& that);
      |   ^~~~~~~~~~~~~~~~~
../../src/aliased_buffer.h:56:3: note:   candidate expects 1 argument, 5 provided
../../src/aliased_buffer.h:49:3: note: candidate: ‘node::AliasedBufferBase<NativeT, V8T>::AliasedBufferBase(v8::Isolate*, size_t, size_t, const node::AliasedBufferBase<unsigned char, v8::Uint8Array>&, const AliasedBufferIndex*) [with NativeT = unsigned char; V8T = v8::Uint8Array; size_t = long unsigned int; node::AliasedBufferIndex = long unsigned int]’
   49 |   AliasedBufferBase(
      |   ^~~~~~~~~~~~~~~~~
../../src/aliased_buffer.h:53:57: note:   no known conversion for argument 4 from ‘v8::Local<v8::Uint8Array>’ to ‘const node::AliasedBufferBase<unsigned char, v8::Uint8Array>&’
   53 |       const AliasedBufferBase<uint8_t, v8::Uint8Array>& backing_buffer,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
../../src/aliased_buffer.h:36:3: note: candidate: ‘node::AliasedBufferBase<NativeT, V8T>::AliasedBufferBase(v8::Isolate*, size_t, const AliasedBufferIndex*) [with NativeT = unsigned char; V8T = v8::Uint8Array; size_t = long unsigned int; node::AliasedBufferIndex = long unsigned int]’
   36 |   AliasedBufferBase(v8::Isolate* isolate,
      |   ^~~~~~~~~~~~~~~~~
../../src/aliased_buffer.h:36:3: note:   candidate expects 3 arguments, 5 provided
../../src/node_tracing.cc: In static member function ‘static void node::tracing::BindingData::CreatePerIsolateProperties(node::IsolateData*, v8::Local<v8::FunctionTemplate>)’:

From what I saw, I need to use AliasedUint8Array because of the methods of GetJSArray and Release.

@bnoordhuis
Copy link
Member

You don't want to use AliasedUint8Array here. That's basically an owned buffer whereas you are wrapping a pointer.

@H4ad H4ad changed the title perf: expose node.http trace flag inside array src: expose node.http trace flag by array May 23, 2023
@H4ad
Copy link
Member Author

H4ad commented May 23, 2023

@bnoordhuis Thanks for the guidance, using just Uint8Array works great!

Now the code is compiling and I tested the new node with https://github.com/H4ad/nodejs-trace-http-performance-analysis and the isTraceHTTPEnabled function no longer appears in the profile.

old-node.cpuprofile.zip
new-node.cpuprofile.zip

About the raw performance improvements, I'll let the benchmark run, but from what I saw with a small test yesterday, the improvements were around 2% and I don't think the results will be greater than that, as the function is expensive for what they are doing, but in all context does not takes much time.

The code is ready for review and I already have some concerns I want to address:

  • @anonrig Did you have any reason to use AliasedUint32Array instead of Uint32Array? I'm asking this because my implementation follows most of your node_url.cc implementation, so my code might be missing something important.
  • Do we really need to use BindingData? The use of BindingData was inspired by node_url.cc, but I think we can move this part of the code to node_trace_events.cc and remove many changes and make this modification simpler.

@H4ad H4ad marked this pull request as ready for review May 23, 2023 22:50
lib/internal/http.js Outdated Show resolved Hide resolved
Comment on lines 91 to 102
NODE_BINDING_CONTEXT_AWARE_INTERNAL(
tracing, node::tracing::BindingData::CreatePerContextProperties)
NODE_BINDING_PER_ISOLATE_INIT(
tracing, node::tracing::BindingData::CreatePerIsolateProperties)
NODE_BINDING_EXTERNAL_REFERENCE(
tracing, node::tracing::BindingData::RegisterExternalReferences)
Copy link
Member

@bnoordhuis bnoordhuis May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, I'm unsure if you need this BindingData stuff. In principle you only need an Initialize() function that creates the typed array but the array should not end up in the snapshot, it should be recreated on deserialization.

@joyeecheung what's the proper course of action here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could still end up in a user-land snapshot, so configuring the serialization/deserialization of it would be good practice. The buffer here is recreated through the realm->AddBindingData<BindingData>() call in Deserialize (which invokes the constructor of BindingData, and reassign the buffer).

src/node_binding.cc Outdated Show resolved Hide resolved
Comment on lines 91 to 102
NODE_BINDING_CONTEXT_AWARE_INTERNAL(
tracing, node::tracing::BindingData::CreatePerContextProperties)
NODE_BINDING_PER_ISOLATE_INIT(
tracing, node::tracing::BindingData::CreatePerIsolateProperties)
NODE_BINDING_EXTERNAL_REFERENCE(
tracing, node::tracing::BindingData::RegisterExternalReferences)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could still end up in a user-land snapshot, so configuring the serialization/deserialization of it would be good practice. The buffer here is recreated through the realm->AddBindingData<BindingData>() call in Deserialize (which invokes the constructor of BindingData, and reassign the buffer).

lib/internal/http.js Outdated Show resolved Hide resolved
@H4ad H4ad force-pushed the perf/is-trace-http branch 3 times, most recently from 08f75a3 to b5a0b14 Compare May 25, 2023 13:03
@H4ad
Copy link
Member Author

H4ad commented May 25, 2023

I leave the benchmark running and here's the result:

                                                                                                 confidence improvement accuracy (*)   (**)  (***)
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=1 len=1024 type='buffer' benchmarker='wrk'            **      1.64 %       ±1.07% ±1.43% ±1.87%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=1 len=1024 type='bytes' benchmarker='wrk'            ***      2.31 %       ±0.77% ±1.03% ±1.34%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=1 len=102400 type='buffer' benchmarker='wrk'          **      0.95 %       ±0.69% ±0.92% ±1.20%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=1 len=102400 type='bytes' benchmarker='wrk'           **     -0.94 %       ±0.66% ±0.87% ±1.14%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=1 len=4 type='buffer' benchmarker='wrk'              ***      2.88 %       ±0.95% ±1.26% ±1.64%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=1 len=4 type='bytes' benchmarker='wrk'               ***      2.82 %       ±0.93% ±1.24% ±1.62%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=4 len=1024 type='buffer' benchmarker='wrk'           ***      2.32 %       ±0.86% ±1.14% ±1.49%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=4 len=1024 type='bytes' benchmarker='wrk'            ***      2.30 %       ±0.88% ±1.17% ±1.52%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=4 len=102400 type='buffer' benchmarker='wrk'          **      1.01 %       ±0.72% ±0.96% ±1.25%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=4 len=102400 type='bytes' benchmarker='wrk'           **     -0.87 %       ±0.56% ±0.74% ±0.97%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=4 len=4 type='buffer' benchmarker='wrk'              ***      2.31 %       ±1.03% ±1.37% ±1.78%
http/simple.js duration=5 chunkedEnc=0 c=50 chunks=4 len=4 type='bytes' benchmarker='wrk'               ***      2.53 %       ±0.90% ±1.20% ±1.56%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=1 len=1024 type='buffer' benchmarker='wrk'          ***      2.67 %       ±0.79% ±1.05% ±1.38%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=1 len=1024 type='bytes' benchmarker='wrk'           ***      3.10 %       ±0.83% ±1.11% ±1.45%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=1 len=102400 type='buffer' benchmarker='wrk'        ***      1.54 %       ±0.77% ±1.03% ±1.35%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=1 len=102400 type='bytes' benchmarker='wrk'                 -1.13 %       ±1.22% ±1.62% ±2.12%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=1 len=4 type='buffer' benchmarker='wrk'             ***      2.65 %       ±0.81% ±1.07% ±1.40%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=1 len=4 type='bytes' benchmarker='wrk'              ***      3.79 %       ±0.84% ±1.12% ±1.46%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=4 len=1024 type='buffer' benchmarker='wrk'          ***      2.55 %       ±0.75% ±0.99% ±1.29%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=4 len=1024 type='bytes' benchmarker='wrk'           ***      2.39 %       ±0.75% ±0.99% ±1.30%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=4 len=102400 type='buffer' benchmarker='wrk'        ***      1.36 %       ±0.52% ±0.69% ±0.90%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=4 len=102400 type='bytes' benchmarker='wrk'                 -1.03 %       ±1.04% ±1.39% ±1.82%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=4 len=4 type='buffer' benchmarker='wrk'             ***      2.03 %       ±0.55% ±0.73% ±0.95%
http/simple.js duration=5 chunkedEnc=0 c=500 chunks=4 len=4 type='bytes' benchmarker='wrk'              ***      2.53 %       ±0.65% ±0.87% ±1.13%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=1 len=1024 type='buffer' benchmarker='wrk'           ***      1.99 %       ±0.94% ±1.25% ±1.63%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=1 len=1024 type='bytes' benchmarker='wrk'            ***      2.35 %       ±0.77% ±1.03% ±1.34%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=1 len=102400 type='buffer' benchmarker='wrk'         ***      1.42 %       ±0.64% ±0.86% ±1.11%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=1 len=102400 type='bytes' benchmarker='wrk'          ***     -1.62 %       ±0.48% ±0.64% ±0.84%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=1 len=4 type='buffer' benchmarker='wrk'              ***      1.69 %       ±0.80% ±1.06% ±1.38%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=1 len=4 type='bytes' benchmarker='wrk'               ***      2.93 %       ±0.81% ±1.07% ±1.40%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=4 len=1024 type='buffer' benchmarker='wrk'            **      1.25 %       ±0.85% ±1.13% ±1.48%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=4 len=1024 type='bytes' benchmarker='wrk'            ***      2.06 %       ±1.10% ±1.47% ±1.92%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=4 len=102400 type='buffer' benchmarker='wrk'         ***      1.12 %       ±0.59% ±0.79% ±1.03%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=4 len=102400 type='bytes' benchmarker='wrk'                  -3.35 %       ±3.48% ±4.69% ±6.22%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=4 len=4 type='buffer' benchmarker='wrk'              ***      1.97 %       ±1.06% ±1.41% ±1.83%
http/simple.js duration=5 chunkedEnc=1 c=50 chunks=4 len=4 type='bytes' benchmarker='wrk'               ***      1.75 %       ±0.93% ±1.23% ±1.60%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=1 len=1024 type='buffer' benchmarker='wrk'          ***      2.96 %       ±0.75% ±1.00% ±1.31%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=1 len=1024 type='bytes' benchmarker='wrk'           ***      2.27 %       ±0.76% ±1.01% ±1.32%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=1 len=102400 type='buffer' benchmarker='wrk'        ***      1.56 %       ±0.58% ±0.78% ±1.01%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=1 len=102400 type='bytes' benchmarker='wrk'                 -2.73 %       ±3.14% ±4.24% ±5.62%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=1 len=4 type='buffer' benchmarker='wrk'             ***      1.88 %       ±0.68% ±0.91% ±1.18%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=1 len=4 type='bytes' benchmarker='wrk'              ***      2.64 %       ±0.89% ±1.19% ±1.55%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=4 len=1024 type='buffer' benchmarker='wrk'          ***      1.90 %       ±0.85% ±1.13% ±1.48%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=4 len=1024 type='bytes' benchmarker='wrk'           ***      1.69 %       ±0.65% ±0.86% ±1.12%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=4 len=102400 type='buffer' benchmarker='wrk'         **      0.77 %       ±0.52% ±0.69% ±0.90%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=4 len=102400 type='bytes' benchmarker='wrk'         ***     -1.20 %       ±0.65% ±0.87% ±1.13%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=4 len=4 type='buffer' benchmarker='wrk'             ***      1.57 %       ±0.64% ±0.85% ±1.11%
http/simple.js duration=5 chunkedEnc=1 c=500 chunks=4 len=4 type='bytes' benchmarker='wrk'              ***      1.89 %       ±0.65% ±0.87% ±1.13%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 48 comparisons, you can thus
expect the following amount of false-positive results:
  2.40 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.48 false positives, when considering a   1% risk acceptance (**, ***),
  0.05 false positives, when considering a 0.1% risk acceptance (***)

In general, this perf improvement will not bring major performance improvements, as I highlighted at #48142 (comment), the main goal was to remove the isTraceHTTPEnabled from the profile and we could achieve that.

With one function less, we can start to work on other functions until these changes stack to a noticeable perf difference.

@H4ad
Copy link
Member Author

H4ad commented May 25, 2023

I don't have any idea what is causing the CI to fail on build, locally I can build

@anonrig
Copy link
Member

anonrig commented May 29, 2023

@nodejs/build any ideas?

@RafaelGSS
Copy link
Member

It should be failing locally. Did you try running ./configure again?

@H4ad
Copy link
Member Author

H4ad commented May 30, 2023

@RafaelGSS Yeah, I tried again using the same commands of the CI, no errors.

src/node_tracing.cc Outdated Show resolved Hide resolved
Instead call the C++ code every time we make a HTTP request,
now we get the C++ pointer to the flag that holds the info
if the trace is enabled for node.http, then we expose that
flag using BindingData, with this change, no C++ call is made
and the access to the info happens in JS side, which has no
perf penalty.
@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Jun 26, 2023
@anonrig anonrig added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jun 26, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 26, 2023
@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig requested a review from RaisinTen June 26, 2023 20:17
@anonrig
Copy link
Member

anonrig commented Jul 3, 2023

@nodejs/http @jasnell appreciate any reviews on this PR

Copy link
Contributor

@RaisinTen RaisinTen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, this change increases the complexity of the code for a low performance gain. Is it worth doing this then?

src/node_trace_events.h Outdated Show resolved Hide resolved
src/node_trace_events.cc Outdated Show resolved Hide resolved
src/node_trace_events.cc Outdated Show resolved Hide resolved
src/node_trace_events.cc Outdated Show resolved Hide resolved
src/node_trace_events.cc Outdated Show resolved Hide resolved
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Jul 5, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jul 5, 2023
@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @RaisinTen. Just removing a function from a sample doesn't bring any value to the project. Unless the code added improves the performance significantly, I don't think we would want that tradeoff.

@H4ad
Copy link
Member Author

H4ad commented Jul 7, 2023

As I understand it, the current question is: how many lines is it worth introducing for this performance improvement?

Honestly, I don't know, from my perspective, I think it's worth adding 124 lines to remove the overhead of calling isTraceHTTPEnabled, and I also agree that the function doesn't take much of the time to benefit any HTTP benchmark, but it's still there:

image

const http = require('http');

let numOfReqs = 0;

const dispatch = function (req, res) {
  const reply = 'Hello World';
  res.writeHead(200, { 'Content-Type': 'text/plain', 'Content-Length': reply.length });
  res.end(reply);

  numOfReqs++;

  if (numOfReqs > 1e5) {
    server.close();
  }
};

const server = new http.Server(dispatch);

server.listen(3000);

node --cpu-prof index-http.js & autocannon -a 120000 -p 10 http://localhost:300

This change alone will not benefit that much, but a couple of these changes could be beneficial in the long term.


And of course, if you have any ideas on how to make this improvement with less code I can try to work on that, until then I think this is the best we can get to eliminate this performance penalty.

@RafaelGSS
Copy link
Member

how many lines is it worth introducing for this performance improvement?

It's not about lines of code, but complexity. If you compare both versions, it's way easier to maintain/read the current approach than the one you are trying to introduce.

As pointed out by the benchmarks, there's no overhead on isTraceHTTPEnabled() (you can see the same in the screenshot you shared).

And of course, if you have any ideas on how to make this improvement with less code I can try to work on that, until then I think this is the best we can get to eliminate this performance penalty.

Just because it's not appearing in the samples, it doesn't mean that performance penalty isn't there. It might be inlined, and considering this being called in the C++ land, you might want to profile it using kernel stack. Anyway, what I'm trying to say is that there's no point (yet) in this PR that makes the application better, and due to the nature of this operation, any improvement might be non-relevant.

@H4ad
Copy link
Member Author

H4ad commented Jul 7, 2023

Just because it's not appearing in the samples, it doesn't mean that the performance penalty isn't there. It might be inlined, and considering this being called in the C++ land, you might want to profile it using kernel stack

The penalty was transferred to the initialization of the buffer. To check the isTraceHTTPEnabled, we read the buffer that contains just 1 item with the pointer to the flag, so there is no penalty here, is like reading a value inside an array, we don't call C++ anymore.

Anyway, what I'm trying to say is that there's no point (yet) in this PR that makes the application better, and due to the nature of this operation, any improvement might be non-relevant.

Any suggestions on what to do? The goal of this PR was achieved and the solution I think is the most performant we could get, the buffer (from what I understand) can only be exposed using BindingData, so there is no way to remove it.

The trace flag can be enabled by embedders so there is no way to achieve this only in JS land (ref).

@RafaelGSS
Copy link
Member

Any suggestions on what to do? The goal of this PR was achieved and the solution I think is the most performant we could get, the buffer (from what I understand) can only be exposed using BindingData, so there is no way to remove it.

Without making this PR even more complex, I don't think so.

@H4ad
Copy link
Member Author

H4ad commented Jul 19, 2023

Closed due the discussion at nodejs/performance#102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants