-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: expose node.http trace flag by array #48142
Conversation
Review requested:
|
Some comments about the current implementation: I create another file called Also, the current code is not compiled due to the following error:
This makes sense since the So, @anonrig, @RafaelGSS or @bnoordhuis have some hint of what I should do? |
Try creating the ArrayBuffer directly from a custom BackingStore: auto p = const_cast<uint8*>(TRACE_EVENT_API_GET_CATEGORY_GROUP_ENABLED("node.http"));
auto nop = [](void*, size_t, void*) {}; // no-op deleter
auto bs = v8::ArrayBuffer::NewBackingStore(p, 1, nop, nullptr);
auto ab = v8::ArrayBuffer::New(isolate, bs);
auto u8 = v8::Uint8Array::New(ab, 0, 1); |
@bnoordhuis I needed to modify a little bit your code in order to compile, but now I cannot convert the Uint8Array to AliasedUint8Array: auto p = const_cast<uint8_t*>(TRACE_EVENT_API_GET_CATEGORY_GROUP_ENABLED("node.http"));
auto nop = [](void*, size_t, void*) {}; // no-op deleter
auto bs = v8::ArrayBuffer::NewBackingStore(p, 1, nop, nullptr);
auto ab = v8::ArrayBuffer::New(realm->isolate(), std::move(bs));
auto u8 = v8::Uint8Array::New(ab, 0, 1);
tracing_categories_buffer_ = AliasedUint8Array(realm->isolate(), 0, 1, u8, nullptr); Message:
From what I saw, I need to use |
You don't want to use AliasedUint8Array here. That's basically an owned buffer whereas you are wrapping a pointer. |
@bnoordhuis Thanks for the guidance, using just Uint8Array works great! Now the code is compiling and I tested the new node with https://github.com/H4ad/nodejs-trace-http-performance-analysis and the About the raw performance improvements, I'll let the benchmark run, but from what I saw with a small test yesterday, the improvements were around 2% and I don't think the results will be greater than that, as the function is expensive for what they are doing, but in all context does not takes much time. The code is ready for review and I already have some concerns I want to address:
|
src/node_tracing.cc
Outdated
NODE_BINDING_CONTEXT_AWARE_INTERNAL( | ||
tracing, node::tracing::BindingData::CreatePerContextProperties) | ||
NODE_BINDING_PER_ISOLATE_INIT( | ||
tracing, node::tracing::BindingData::CreatePerIsolateProperties) | ||
NODE_BINDING_EXTERNAL_REFERENCE( | ||
tracing, node::tracing::BindingData::RegisterExternalReferences) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, I'm unsure if you need this BindingData stuff. In principle you only need an Initialize() function that creates the typed array but the array should not end up in the snapshot, it should be recreated on deserialization.
@joyeecheung what's the proper course of action here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could still end up in a user-land snapshot, so configuring the serialization/deserialization of it would be good practice. The buffer here is recreated through the realm->AddBindingData<BindingData>()
call in Deserialize
(which invokes the constructor of BindingData, and reassign the buffer).
src/node_tracing.cc
Outdated
NODE_BINDING_CONTEXT_AWARE_INTERNAL( | ||
tracing, node::tracing::BindingData::CreatePerContextProperties) | ||
NODE_BINDING_PER_ISOLATE_INIT( | ||
tracing, node::tracing::BindingData::CreatePerIsolateProperties) | ||
NODE_BINDING_EXTERNAL_REFERENCE( | ||
tracing, node::tracing::BindingData::RegisterExternalReferences) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could still end up in a user-land snapshot, so configuring the serialization/deserialization of it would be good practice. The buffer here is recreated through the realm->AddBindingData<BindingData>()
call in Deserialize
(which invokes the constructor of BindingData, and reassign the buffer).
08f75a3
to
b5a0b14
Compare
I leave the benchmark running and here's the result:
In general, this perf improvement will not bring major performance improvements, as I highlighted at #48142 (comment), the main goal was to remove the With one function less, we can start to work on other functions until these changes stack to a noticeable perf difference. |
I don't have any idea what is causing the CI to fail on build, locally I can build |
@nodejs/build any ideas? |
It should be failing locally. Did you try running |
@RafaelGSS Yeah, I tried again using the same commands of the CI, no errors. |
Instead call the C++ code every time we make a HTTP request, now we get the C++ pointer to the flag that holds the info if the trace is enabled for node.http, then we expose that flag using BindingData, with this change, no C++ call is made and the access to the info happens in JS side, which has no perf penalty.
@nodejs/http @jasnell appreciate any reviews on this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, this change increases the complexity of the code for a low performance gain. Is it worth doing this then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @RaisinTen. Just removing a function from a sample doesn't bring any value to the project. Unless the code added improves the performance significantly, I don't think we would want that tradeoff.
As I understand it, the current question is: how many lines is it worth introducing for this performance improvement? Honestly, I don't know, from my perspective, I think it's worth adding 124 lines to remove the overhead of calling const http = require('http');
let numOfReqs = 0;
const dispatch = function (req, res) {
const reply = 'Hello World';
res.writeHead(200, { 'Content-Type': 'text/plain', 'Content-Length': reply.length });
res.end(reply);
numOfReqs++;
if (numOfReqs > 1e5) {
server.close();
}
};
const server = new http.Server(dispatch);
server.listen(3000);
This change alone will not benefit that much, but a couple of these changes could be beneficial in the long term. And of course, if you have any ideas on how to make this improvement with less code I can try to work on that, until then I think this is the best we can get to eliminate this performance penalty. |
It's not about lines of code, but complexity. If you compare both versions, it's way easier to maintain/read the current approach than the one you are trying to introduce. As pointed out by the benchmarks, there's no overhead on
Just because it's not appearing in the samples, it doesn't mean that performance penalty isn't there. It might be inlined, and considering this being called in the C++ land, you might want to profile it using kernel stack. Anyway, what I'm trying to say is that there's no point (yet) in this PR that makes the application better, and due to the nature of this operation, any improvement might be non-relevant. |
The penalty was transferred to the initialization of the buffer. To check the
Any suggestions on what to do? The goal of this PR was achieved and the solution I think is the most performant we could get, the buffer (from what I understand) can only be exposed using The trace flag can be enabled by embedders so there is no way to achieve this only in JS land (ref). |
Without making this PR even more complex, I don't think so. |
Closed due the discussion at nodejs/performance#102 |
This PR aims to fix the issue raised on NodeJS Performance: nodejs/performance#81
The function
isTraceHTTPEnable
is a little bit expensive, in general, that function should not even appear in the stack trace since is just to check if the trace is enabled.The solution was inspired by nodejs/performance#81 (comment) and it basically exposes the C++ pointer of the flag that tells if the HTTP trace is enabled or not by using an array.
To Reviewers
Some interesting files that could be looked to review this code:
node/deps/v8/src/builtins/builtins-trace.cc
Lines 107 to 123 in 300f68e
node/deps/v8/src/libplatform/tracing/tracing-controller.cc
Lines 295 to 344 in 300f68e
node/src/tracing/trace_event.h
Lines 62 to 73 in 300f68e