-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[async hooks] proposal for standard CLS API - request for feedback #345
Comments
There's also nodejs/node#26540, which is currently under active development too. It's worth comparing the two to consider the flexibility and performance of each. It's a fairly performance-sensitive area, and is prone to memory issues. We should consider carefully what our needs are from this API. |
@Qard The description section in my PR has an complete overview, but let me compare my implementation with nodejs/node#26540 here. Pros of proposed API:
Cons of proposed API:
Of course, I'm prepared for my PR to be totally ignored, as it was with nodejs/node#27172. |
As already mentioned in nodejs/node#26540 the async What I miss in general in both CLS APIs is the ability to stop propagation or avoid it for some case. e.g. it's formal correct that a context is propagated forever in case of e.g. a Besides that I miss functionality to allow to trigger other side effects on context change. See for example how |
In case of
That sounds like an advanced API to me, which is not required in my use cases. But if there will be multiple votes to include it into the API, I could implement it in a similar manner as in your implementation. |
This requires an extra, manual call at all these locations which is exactly the opposite of a "just works" CLS as you have to clutter your application code with hints towards CLS instead of one time configuration. |
I can hardly believe in existence of "just works" stop-context-propagation configuration API. Exact logic heavily depends on the application needs. But I believe that having a way to completely stop the propagation and free the memory is a must for a CLS API. |
I have modified $ ./node benchmark/async_hooks/async-resource-vs-destroy.js benchmarker=autocannon
async_hooks/async-resource-vs-destroy.js n=1000000 method="callbacks" type="async-local" benchmarker="autocannon": 20,277.6
async_hooks/async-resource-vs-destroy.js n=1000000 method="async" type="async-local" benchmarker="autocannon": 15,877.6
async_hooks/async-resource-vs-destroy.js n=1000000 method="callbacks" type="async-context" benchmarker="autocannon": 16,922.41
async_hooks/async-resource-vs-destroy.js n=1000000 method="async" type="async-context" benchmarker="autocannon": 11,841.2
async_hooks/async-resource-vs-destroy.js n=1000000 method="callbacks" type="async-resource" benchmarker="autocannon": 23,582.4
async_hooks/async-resource-vs-destroy.js n=1000000 method="async" type="async-resource" benchmarker="autocannon": 18,407.2 As expected, |
@puzpuzpuz is the call to |
Probably, but it's hard to tell for sure. Their public APIs are quite different and |
@puzpuzpuz actually this whole part would go away eventuallyhttps://github.com/nodejs/node/pull/26540/files#diff-0bb01a51b135a5f68d93540808bac801R201-R214 In the end, the only data structure that would remain would be the map used for holding context. I would like to keep it as it is closer to the original CLS implementation that most APM vendors have copied over the years. I am pretty opinionated on the instanciation of only one Async Hook for the whole process as I find it faster than having multiple ones. |
Yes, it will remove one However, this change won't add copy-on-write semantics for With
Nothing prevents APM vendors or 3rd-party library authors to build a thin-wrapper that provide a On the other hand, users who want to store a single value in CLS API won't have to pay for an additional
|
The
I am not sure I get the quesiton, I would simply do a
Fair point, but a few comments on the PR actually asked how close it is to current existing implementation. IMO this is a design goal to rely on something that is already used.
I don't think this
TBH, this choice is mostly based on nodejs/node#16222 (comment). but this is probably more discussions to have in the respective PRs. |
Oh, I've missed that change. So, it puts the
A
As far as I understand, main goal for core APIs is to provide flexible (and sometimes low-level) APIs that can be used directly or used to build user-land libraries on top of them. But maybe this time it's an exception to the rule.
I believe that for core APIs, every data structure counts.
Not sure if that comment is applicable to CLS API. Anyway, it can be easily changed in both directions without any changes in the external behavior. |
@vdeturckheim May I ask which APM vendor implemenations you refer to? My search shows different implemenations: DataDog: In before hook they set their context by a sync enter call and in after hook the clear it. Elastic: This is different, they do a lookup based on OpenTelemetry: There a I'm working at dynatrace (unfortunatelly our agent is not open source so no link 😢) and we use mostly the same mechanism as NewRelic/DataDog. There is cls-hooked which has an API which reminds me to AsyncStorage. I think this lib is usually used by application programmers but not APMs. APMs just share the idea and the mapping to async_hooks. As there was some link above towards the ticket to use async_hooks in domains. Would it be possible to port Domains to use the proposed CLS system to have a proove of concept? |
As far as I know, porting domains to Nothing prevents from adding such advanced API into the proposed implementation, say, as a port of similar functionality from your PR (nodejs/node#27172). That could be easily done in future or in initial implementation - depends on feedback from async_hooks/diagnostics groups. |
I think it should be a goal to have an API that can support multiple consumers and just work. So many times I've seen companies try to run multiple APM vendors in production and it crashes their app because they conflict with each other. It's very difficult coding defensively enough to not break when another APM vendor does something unexpected to the environment--a common thing I've seen is regular properties redefined with |
Thanks for bringing this important concern up. In the proposed implementation |
I think each having their own hooks is actually a minor point against them because of the additional overhead. The less repetitive code paths we can introduce in shared logic the better, while still retaining functional isolation. |
Could you tell me what is the reason behind that overhead? Extra context switches between native code and JS or it's something different? It's a trivial change to do, but before doing that I'd like to understand how critical it is. |
Well, multiple hooks should be merged together on JS-side already, in async_hooks internals, but that's an implementation detail. This does however mean that there will be separate hook objects with separate associated event handlers, each with their own closing scope. Given that this is a very hot path, it would be best to minimize that as much as possible. APM vendors typically want a unique object per-request, not shared globally, which means it should be able to register the hooks once but generate context objects out of those hooks many times. I've also sometimes seen the layered object approach where each async branch gets a new object with its prototype set to that of the parent tick--while that is a powerful design, it's also much more expensive and can lead to undesired scenarios like orphaned branches when activity is not directly in the async call path which leads to the request end. In my experience, the best approach overall in terms of performance and reliability is to be able to simply have an always-running background hook and the ability to at any arbitrary time start a context object which is passed down the async tree as-is. |
@Qard Update. Done that. |
Thanks @Qard for stepping into this! @Flarna , I was not talking about implementation but about external API. the original CLS implementation exposes Regarding the big picture, I have discussed with multiple TSC members and collaborators during the collaborator summit in Montréal. Everyone seemed to agree with the meed of a high level API to provide such features in core. It also seems that it has been agreed upon during a diagnostic summit (nodejs/node#26540 (comment) and nodejs/node#26540 (comment)). At this point, I am afraid that we (and I am very responsible for it) end up bikeshedding too much in this current discussion. PR reviews should happen in PRs for better tracability. |
@vdeturckheim Yes, this was discussed on diag summit last march - I participated there 😁. This was the main reason why I tried and reviewed your Main issue with async hooks is that they are still experimental. On diag summit we assumed that this avoids that user space modules start to use Getting AsyncHooks API out of experimental is hard for several reasons (they expose internals, easy to use wrong,....). The idea was to add a CLS API in core which doesn't expose internals and therefore is easier to get out of experimental and fits most use cases covered currently by async hooks. Async hooks could be used internal and stay experimental to cover the remaining use cases. So yes, adding something is hopefully enough to reach the target that Meanwhile my motivation to further work on CLS in core is not that high anymore. Partly because most of my input was ignored anyway 😢 but besides that I have the feeling there is no unique understanding on what a CLS should actually do. Besides adding CLS to core there were some more points regarding async hooks discussed on diag summit like use of resource instead id. Idea was to get rid of destroy hook - at least for the "main" use case CLS. The destroy hook is quite bad regarding performance in special if |
Now with all the mentioned PRs closed, where does this continue? |
I think once we are comfortable we are not going to change the API in the short term we should open a new issue to collect input on the API and then do a blitz to promote/ask for feedback on the API pointing to that issue. Make sense? |
Might be then, when CLS will be merged, there must be some relatied description, easily viewable from the Async Hooks part of Documentation... Then, when developer will find an issue, they would be informed where to ask and how to ask properly? (@vdeturckheim) (@puzpuzpuz) what do you think? |
Documentation already includes a section for I think, this should be enough to point users in the right direction. Do you have any ideas on possible enhancements on your mind? |
Sorry for misexplanation. I suppose to add note about place where exactly users can post an issue. For example, there could be quote for users:
|
@wentout I think it might be that we would use: "If you will find an issue in this api, please comment on issue #XYZ in the diagnostic repo. You can find more related information here: #345" This is basically your suggestion, just that instead of new issues we ask for people to comment on the issue that we've opened to capture the input in advance. |
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made. |
Hi guys,
I believe that upcoming
executionAsyncResource()
function (nodejs/node#30959) will allow building a simple and robust CLS API as a part ofasync_hooks
. I've created an experimental PR that shows how this API could look like: nodejs/node#31016Early reviews and feedback from you would be very valuable.
The text was updated successfully, but these errors were encountered: