-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use diagnostics_channel in layers #96
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the overhead of adding diagnostics_channel to express enabled and disabled?
lib/layer.js
Outdated
@@ -65,9 +78,23 @@ Layer.prototype.handle_error = function handle_error(error, req, res, next) { | |||
return next(error) | |||
} | |||
|
|||
req.layerStack = req.layerStack || [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend against creating a new array for every request if diagnostics_channel is not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Seemed like a possibly useful feature outside of the diagnostics_channel stuff itself, but sounds like there's already another layer-tracking thing in discussion, so I'll take a closer look at that. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May also need to consider what the expected behavior is when different versions of this module are mixed together.
I think this is actually two features added (not that I think that should pose a problem). It both tracks the layers hit on a FWIW, I think this approach is fine with that layer array, but it will still require most folks to iterate to determine the last "route" layer which is typically what folks want when they ask for this feature. Existing PR for "layer tracking": #86 |
Yes @wesleytodd , I thought the same thing regarding this PR. Looking at the tracking this adds, I think it is done differently from what I've seen the others doing so far, including that linked PR. Specifically, it doesn't store the matched layers/routes on the request, instead it is trying to keep just a stack of the current hierchy of where it is at the moment, popping them back off once it leaves (vs keeping them for tracking which ones matched). |
Yeah, and maybe that is not a problem, but also if we can solve the long standing ask and also deliver on the needs to reporting to the diag channel I think we would want to. |
Yeah, so the layer matching bits are basically because what an APM will want from this diagnostics_channel data is to be able to lookup what is the full path to the route I'm at right now. Ideally we would be able to attribute any async activity to the route or middleware it originated from. I had initially tried to use the |
For the "send a response" case you can just use |
Gotcha. I see you are just string concatening together regexps and arrays in your end result path string, so I'm not sure that is going to work well as a general feature. |
Doesn't look like As for the path string, it's a concatenation of the routing path strings before being converted to regex. For example: const router = new Router()
const hello = new Router()
router.use('/hello', hello)
hello.get('/:name', ...) This router structure would result in a concatenated path of |
It sounds like you are assuming the users are passing a string in the first place. Take the following router combo, though: const router = new Router()
const hello = new Router()
router.use(['/hello', /^\/(?:good)?bye/, '/aloha'], hello)
hello.get('/:name', ...) |
That's fine. It will use the stringified form of the regex. The purpose is just to have a unique-to-that-route name to match transaction data to. If that name is |
Ah, I see. But of course this router allows you to declare the same route name multiple times, as it does not enforce that route names are unique. This is because of things like const router = new Router()
const hello = new Router()
router.use('/hello', hello)
router.get('/hello/:name', (req, res) => res.end(`hello, ${name}`))
hello.param('name', (req, res, next, name) => next(/^[a-z]$/.test(name) ? null : 'route'))
hello.get('/:name', (req, res, next) => res.end(`hola, ${name}`)) The above has effectively two different routes named |
lib/layer.js
Outdated
} catch (err) { | ||
req.layerStack.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be put into a finally
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd run after the next(...)
though, which would interfere with subsequent routes, as far as I can tell. Could do that if the next(...)
on the following line was in a process.nextTick(...)
, but that'd have a performance hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. But isn't the one within the try
already running after the next
call? Like if the middleware sync calls next
then it will run after, but if the middleware async calls next
it will be before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, true. Probably a better place to put that code. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it looks like adding anything non-trivial to your tests seems to highlight the weirdness here. For example, I added the cors
module to the nested routes
test like the following:
it('nested routes have multiple layers with paths', function (done) {
var outer = new Router();
var inner = new Router();
inner.get('/:name', function (req, res) {
res.end();
});
outer.use('/hello', cors(), inner);
function end() {
assert.strictEqual(joinLayerStack(handleRequest.request.layerStack), '/hello/:name/');
done();
}
outer.handle({ url: '/hello/world', method: 'GET', headers: {} }, { end, setHeader: noop }, noop);
});
But it failed the test listing the path as /hello/hello/:name/
, as I would surmise that it is performing the pop operation too late, after the middleware is getting iterated through (the more middleware added, the more times /hello
appears).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a possible fix for the layer timing issue. Not sure it's the best solution, but seems to work. Still unsure if the layerStack approach is the best way. Willing to accept any suggestions on better ways to track the overall routing state of the app. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yea. It seems seems to have issues with the latest change keeping track of the stack. For example adding inner.get('/world', (req, res, next) => next())
above inner.get
produces the route as /hello/world/:name
.
As far as a best solution... I think perhaps it may be better to maybe start from the top so we can understand what, exactly, we are trying to accomplish in this pull request? The PR title and description only mentions adding hooks for diagnostics_channel
, but almost all the following discussion does not seem related to the actual diagnostics_channel
usage, and rather adding a completely new feature, independent of the diagnostics_channel
feature.
Should these two things be split apart into two PRs (first suggested at #96 (comment)) or can the description be updated to describe what the goal of this PR is? That may help better focus the conversation and code work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing routing information is the whole point of adding diagnostics_channel
, so this PR doesn't really make sense without it. I'll update the description to elaborate on that need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I get it, but let's think iteratively here, if you are interested in getting things landed instead of waiting for the entire pie to be built. For example, folks want to construct the path even outside of diagnostics_channel
, so having access to the path is an independent ask of adding diagnostics_channel
support. In addition, if you didn't have that layer stack hanging off the req
object, you still already have access to a whole bunch of information, including that the request is being handled, that an error occurred, what the http method is, the http path, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get all that directly from http already. The only context we're really missing from express is the routing information. I only intended the layer tracking stuff in this to be used by diagnostics_channel. Maybe there are more general-purpose uses for that information, but I lack the context and the time to do anything about those right now. This PR was created primarily as a demonstration that the diagnostics_channel
API is usable in more complex scenarios. It's not intended to be landed any time soon as the diagnostics_channel
feature itself hasn't even landed yet. If you want something more complex from this, I'm probably not going to get to it for awhile.
I don't think it's a big issue that the paths aren't unique. It's just a bucketing mechanism. If a bucket might contain data from two routes that do different things but share the same path, it's not ideal, but there's not really any other great way to differentiate them. Most routes internally have some degree of branching anyway, so it's expected that every request in that bucket might not look exactly the same. It doesn't break anything if data from multiple routes winds up in the same bucket, it just makes the data maybe slightly less meaningful. Also, small side note: the HTTP method is also generally included in the string used as the bucket key. I'm not concerned about two routes winding up with the same name, so long as that name can be instructive to the user on where in their app to look for the code. If they happen to have multiple routes with the same name, it's on them to look at those routes and figure out on their own which one a given trace came from. Route-naming is very much best-effort in APM. There's lots of cases where we get stuff like |
|
||
function end() { | ||
assert.strictEqual(joinLayerStack(handleRequest.request.layerStack), '/hello/:name/'); | ||
assert.strictEqual(handleError.error, error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this validate that the handle error layer stack is pointing to the /
path for the error handler that was executed that called res.end()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we should be treating error handlers as top-level routes, that's an implementation detail. It makes more sense from the APM perspective to treat them as continuations of whatever route the error handler was reached from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, interesting. I was assuming your wanted the layer stack to be the current stack of layers--you didn't note that certain layers were going to be treated differently. That is likely a very important point that was left out of your explanation.
So you are saying that even an error handler that is itself a route would not be listed? I.e. app.get('/blog/:slug', (err, req, res, next) = {})
It seems like when you have a complex app where you have a lot of error handlers all defined at a lot of different paths, it may be useful to understand which of your error handlers was the one that ran the error handling, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a look, I'm not sure if this behavior is why the following will generate a layer stack showing the "route" is /hello/:name/hello/:val
rather than showing it as /hello/:val
when called as GET /hello/100
:
var router = new Router();
router.get('/hello/:name', (req, res, next) => {
if (!isNaN(Number(req.params.name))) next('route')
else next()
}, (req, res) => {
res.end()
});
router.get('/hello/100', (req, res) => {
res.end()
})
done(); | ||
} | ||
|
||
router.handle({ url: '/hello/world', method: 'GET' }, { end }, noop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a general comment: I'm not sure if you planed to clean this up later, so apologies if so, just wanted to call out that we don't want to be passing in mock-like objects to the router handling methods; they should be the real Node.js HTTP objects like all the other tests so we validate that everything is working as new Node.js versions come out and these objects change over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the changes are all copied and pasted from the PR I made directly to express, which apparently did tests differently. I'll clean up the tests at some point, when I get back to this. 👍
7e90c9e
to
8643ec6
Compare
@Qard I am working on getting this repo back on track for the renewed plans for v5. Are you interested in landing this still? Let me know so I can get it on our plans if so. |
It would still be of value to have diagnostics_channel in there, yes. Though it probably needs reworking at this point to get it up-to-date. At the time we opted for targeting fastify as a routing framework to test diagnostics_channel with instead as express and related projects had somewhat stagnated at that time so we didn't see the changes as likely to actually land any time soon at that point. I'll share with the team at Datadog to see if we want to put some time into updating this in the near future. No big deal if you're pushing for release soon though--we can add support later if necessary. Thanks for the reminder though! I had entirely forgotten I made this. 😅 |
Yeah this would likely be a minor release so not a big deal to wait, but I was just trying to wrangle all the open things to make sure we had a clear plan in place and this still does seem valuable to me. Let me know if you folks have time to work on this! |
This is a proof-of-concept implementation of the diagnostics_channel feature I'm working on in Node.js core. The value here is for APM products to be able to capture changes to routing state through the lifecycle of a request. This should of course not be merged until the module exists in Node.js itself. I'm just creating this now to prove the value to userland and to make sure a more complex framework can benefit from it.
APM vendors want to use events from
diagnostics_channel
to know when a route has changed and what the full routing path to the route is so they can bucket trace data based on the full routing path. For example, requests to/hello/world
and/hello/stephen
might map to a/:name
route on a nested router under/hello
which should be able to produce/hello/:name
as the full routing path. This requires some method of tracking the active layer so the point at which the diagnostics_channel events are published there will be valid routing information present for the subscriber to gather from the request.I'll leave this as a draft PR until diagnostics_channel lands in Node.js core.