Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: AxiosPlugin async() / resync() #21

Merged
merged 8 commits into from
Jan 5, 2021
Merged

Conversation

tom-pytel
Copy link
Contributor

@kezhenxu94 This fixes my problem, let me know if it fixes the double span record problem for you.

@tom-pytel
Copy link
Contributor Author

Ok, well this still has problems but curious if it solves the double span so let me know on that.

@kezhenxu94 kezhenxu94 added bug Something isn't working plugin labels Jan 3, 2021
@kezhenxu94 kezhenxu94 added this to the 0.2.0 milestone Jan 3, 2021
@kezhenxu94
Copy link
Member

This doesn't seem to fix the bug of duplicate spans,

@tom-pytel
Copy link
Contributor Author

This doesn't seem to fix the bug of duplicate spans,

I need a test case I can use to be able to track this down locally on my side.

@kezhenxu94
Copy link
Member

kezhenxu94 commented Jan 3, 2021

This doesn't seem to fix the bug of duplicate spans,

I need a test case I can use to be able to track this down locally on my side.

$ pwd
~/workspace/skywalking-nodejs

$ npx ts-node tests/plugins/axios/server.ts &

$ npx ts-node tests/plugins/axios/client.ts &

$ curl http://localhost:5001/axios

And because this generates invalid spans, the Web UI only renders the correct one, hence you need to see the raw spans in the storage, or via curl like this:

curl 'http://localhost:8080/graphql' -X 'POST' \
  -H 'Content-Type: application/json;charset=utf-8' \
  --data-binary '{"query":"query queryTrace($traceId: ID!) {\n  trace: queryTrace(traceId: $traceId) {spans { traceId segmentId spanId  parentSpanId refs { traceId parentSegmentId parentSpanId type} serviceCode startTime endTime endpointName type peer component isError layer}}}","variables":{"traceId":"e3e4d04a95d74f6580f6a9bdec2f789b"}}'

Replace the trace id with the one on the web ui

@tom-pytel
Copy link
Contributor Author

The problem was that on return from the request hook before there was no way to tell if then next exit span was going to be another axios request or the http request for this axios request or even if it would be a completely unrelated http request (no easy way to tell if it belongs to this axios request), so either async() needed to be applied or no. The solution was to do a deeper axios hook as close to the http request as it could be and luckily this was possible so that the http request even wound up being in the same async context as the hook, so no need for async() here.

I discovered another problem though which is not even axios but more general since it exists with normal http request as well. I am not sure if it was working before or if I just did not notice, but if I do my test with the multiple async requests like:

await Promise.all(new Array(100).fill().map(() => downloadPageA('localhost', 8100, '/')));

Only the first 14-21 or so have a correlated entry span in express server. This is the case whether the request is axios or just simple http. Not sure if this is a hard-coded limit somewhere or some resource is being exhausted, could you take a look?

@tom-pytel
Copy link
Contributor Author

Review this error please?

@kezhenxu94
Copy link
Member

kezhenxu94 commented Jan 3, 2021

Review this error please?

Will take a look today. Thanks.

UPDATE: checking

@kezhenxu94
Copy link
Member

kezhenxu94 commented Jan 4, 2021

Well this doesn't look good even on web UI

Actual:
image

Expected:
image

Ignore the component name in latter picture, just for reference of the hierarchy

@tom-pytel
Copy link
Contributor Author

Well this doesn't look good even on web UI

Actual:
image

Expected:
image

Ignore the component name in latter picture, just for reference of the hierarchy

Ok, well this version actually solves all the problems I was having in my stuff with Axios so this has to be a tweak somewhere, will check. But the double spans (http + axios) for outgoing requests are gone no?

@kezhenxu94
Copy link
Member

But the double spans (http + axios) for outgoing requests are gone no?

I'd say it is worse, the duplicated spans are gone but the segments are duplicated now.

@kezhenxu94
Copy link
Member

THE DATA

2021-01-04T00:10:01.8592483Z console.info
2021-01-04T00:10:01.8592898Z {
2021-01-04T00:10:01.8593538Z actualData: 'segmentItems:\n' +
2021-01-04T00:10:01.8594208Z '- serviceName: server\n' +
2021-01-04T00:10:01.8597547Z ' segmentSize: 2\n' +
2021-01-04T00:10:01.8598014Z ' segments:\n' +
2021-01-04T00:10:01.8598695Z ' - segmentId: b8af9513cc0d4dbe9c16a5fad275a796\n' +
2021-01-04T00:10:01.8599530Z ' spans:\n' +
2021-01-04T00:10:01.8600055Z ' - operationName: /axios\n' +
2021-01-04T00:10:01.8600595Z ' operationId: 0\n' +
2021-01-04T00:10:01.8601120Z ' parentSpanId: -1\n' +
2021-01-04T00:10:01.8601587Z ' spanId: 0\n' +
2021-01-04T00:10:01.8602069Z ' spanLayer: Http\n' +
2021-01-04T00:10:01.8602577Z ' startTime: 1609718849175\n' +
2021-01-04T00:10:01.8603074Z ' endTime: 1609718849178\n' +
2021-01-04T00:10:01.8603564Z ' componentId: 49\n' +
2021-01-04T00:10:01.8604039Z ' isError: false\n' +
2021-01-04T00:10:01.8604530Z ' spanType: Entry\n' +
2021-01-04T00:10:01.8605013Z ' peer: server:5000\n' +
2021-01-04T00:10:01.8605533Z ' skipAnalysis: false\n' +
2021-01-04T00:10:01.8605990Z ' tags:\n' +
2021-01-04T00:10:01.8606551Z " - {key: http.url, value: 'server:5000/axios'}\n" +
2021-01-04T00:10:01.8607203Z " - {key: http.status.code, value: '200'}\n" +
2021-01-04T00:10:01.8607727Z ' refs:\n' +
2021-01-04T00:10:01.8608482Z " - {parentEndpoint: '', networkAddress: 'server:5000', refType: CrossProcess,\n" +
2021-01-04T00:10:01.8609778Z ' parentSpanId: 1, parentTraceSegmentId: 6c5f8e313841493d9ea2087125026d28, parentServiceInstance: 8c0e9de89af8,\n' +
2021-01-04T00:10:01.8611157Z ' parentService: client, traceId: a5fd41cb954d46e694d28af76247ee51}\n' +
2021-01-04T00:10:01.8611920Z ' - operationName: /json\n' +
2021-01-04T00:10:01.8612466Z ' operationId: 0\n' +
2021-01-04T00:10:01.8612986Z ' parentSpanId: -1\n' +
2021-01-04T00:10:01.8613481Z ' spanId: 1\n' +
2021-01-04T00:10:01.8613960Z ' spanLayer: Http\n' +
2021-01-04T00:10:01.8614489Z ' startTime: 1609718849180\n' +
2021-01-04T00:10:01.8615002Z ' endTime: 1609718849221\n' +
2021-01-04T00:10:01.8615502Z ' componentId: 4005\n' +
2021-01-04T00:10:01.8616112Z ' isError: false\n' +
2021-01-04T00:10:01.8616620Z ' spanType: Exit\n' +
2021-01-04T00:10:01.8617152Z ' peer: httpbin.org\n' +
2021-01-04T00:10:01.8617693Z ' skipAnalysis: false\n' +
2021-01-04T00:10:01.8618179Z ' tags:\n' +
2021-01-04T00:10:01.8618773Z " - {key: http.url, value: 'httpbin.org:80/json'}\n" +
2021-01-04T00:10:01.8620213Z " - {key: http.status.code, value: '200'}\n" +
2021-01-04T00:10:01.8620883Z ' - {key: http.status.msg, value: OK}\n' +
2021-01-04T00:10:01.8621667Z ' - segmentId: b8af9513cc0d4dbe9c16a5fad275a796\n' +
2021-01-04T00:10:01.8622290Z ' spans:\n' +
2021-01-04T00:10:01.8622806Z ' - operationName: /axios\n' +
2021-01-04T00:10:01.8623354Z ' operationId: 0\n' +
2021-01-04T00:10:01.8623880Z ' parentSpanId: -1\n' +
2021-01-04T00:10:01.8624493Z ' spanId: 0\n' +
2021-01-04T00:10:01.8624960Z ' spanLayer: Http\n' +
2021-01-04T00:10:01.8625478Z ' startTime: 1609718849175\n' +
2021-01-04T00:10:01.8625963Z ' endTime: 1609718849178\n' +
2021-01-04T00:10:01.8626453Z ' componentId: 49\n' +
2021-01-04T00:10:01.8626929Z ' isError: false\n' +
2021-01-04T00:10:01.8627419Z ' spanType: Entry\n' +
2021-01-04T00:10:01.8627910Z ' peer: server:5000\n' +
2021-01-04T00:10:01.8628427Z ' skipAnalysis: false\n' +
2021-01-04T00:10:01.8628896Z ' tags:\n' +
2021-01-04T00:10:01.8629451Z " - {key: http.url, value: 'server:5000/axios'}\n" +
2021-01-04T00:10:01.8630111Z " - {key: http.status.code, value: '200'}\n" +
2021-01-04T00:10:01.8630618Z ' refs:\n' +
2021-01-04T00:10:01.8631372Z " - {parentEndpoint: '', networkAddress: 'server:5000', refType: CrossProcess,\n" +
2021-01-04T00:10:01.8632722Z ' parentSpanId: 1, parentTraceSegmentId: 6c5f8e313841493d9ea2087125026d28, parentServiceInstance: 8c0e9de89af8,\n' +
2021-01-04T00:10:01.8633899Z ' parentService: client, traceId: a5fd41cb954d46e694d28af76247ee51}\n' +
2021-01-04T00:10:01.8634809Z ' - operationName: /json\n' +
2021-01-04T00:10:01.8635348Z ' operationId: 0\n' +
2021-01-04T00:10:01.8635873Z ' parentSpanId: -1\n' +
2021-01-04T00:10:01.8636342Z ' spanId: 1\n' +
2021-01-04T00:10:01.8636824Z ' spanLayer: Http\n' +
2021-01-04T00:10:01.8637329Z ' startTime: 1609718849180\n' +
2021-01-04T00:10:01.8637833Z ' endTime: 1609718849221\n' +
2021-01-04T00:10:01.8638319Z ' componentId: 4005\n' +
2021-01-04T00:10:01.8638815Z ' isError: false\n' +
2021-01-04T00:10:01.8639296Z ' spanType: Exit\n' +
2021-01-04T00:10:01.8639808Z ' peer: httpbin.org\n' +
2021-01-04T00:10:01.8640335Z ' skipAnalysis: false\n' +
2021-01-04T00:10:01.8640812Z ' tags:\n' +
2021-01-04T00:10:01.8641383Z " - {key: http.url, value: 'httpbin.org:80/json'}\n" +
2021-01-04T00:10:01.8642059Z " - {key: http.status.code, value: '200'}\n" +
2021-01-04T00:10:01.8642715Z ' - {key: http.status.msg, value: OK}\n' +
2021-01-04T00:10:01.8643280Z '- serviceName: client\n' +
2021-01-04T00:10:01.8643795Z ' segmentSize: 2\n' +
2021-01-04T00:10:01.8644244Z ' segments:\n' +
2021-01-04T00:10:01.8644853Z ' - segmentId: 6c5f8e313841493d9ea2087125026d28\n' +
2021-01-04T00:10:01.8645372Z ' spans:\n' +
2021-01-04T00:10:01.8645880Z ' - operationName: /axios\n' +
2021-01-04T00:10:01.8646402Z ' operationId: 0\n' +
2021-01-04T00:10:01.8646916Z ' parentSpanId: -1\n' +
2021-01-04T00:10:01.8647388Z ' spanId: 0\n' +
2021-01-04T00:10:01.8647864Z ' spanLayer: Http\n' +
2021-01-04T00:10:01.8648451Z ' startTime: 1609718849161\n' +
2021-01-04T00:10:01.8648967Z ' endTime: 1609718849163\n' +
2021-01-04T00:10:01.8649462Z ' componentId: 49\n' +
2021-01-04T00:10:01.8649940Z ' isError: false\n' +
2021-01-04T00:10:01.8650429Z ' spanType: Entry\n' +
2021-01-04T00:10:01.8650931Z ' peer: localhost:5001\n' +
2021-01-04T00:10:01.8651461Z ' skipAnalysis: false\n' +
2021-01-04T00:10:01.8651924Z ' tags:\n' +
2021-01-04T00:10:01.8652504Z " - {key: http.url, value: 'localhost:5001/axios'}\n" +
2021-01-04T00:10:01.8653160Z " - {key: http.status.code, value: '200'}\n" +
2021-01-04T00:10:01.8653765Z ' - operationName: /axios\n' +
2021-01-04T00:10:01.8654281Z ' operationId: 0\n' +
2021-01-04T00:10:01.8654805Z ' parentSpanId: -1\n' +
2021-01-04T00:10:01.8655287Z ' spanId: 1\n' +
2021-01-04T00:10:01.8655755Z ' spanLayer: Http\n' +
2021-01-04T00:10:01.8656269Z ' startTime: 1609718849164\n' +
2021-01-04T00:10:01.8656748Z ' endTime: 1609718849223\n' +
2021-01-04T00:10:01.8657244Z ' componentId: 4005\n' +
2021-01-04T00:10:01.8657730Z ' isError: false\n' +
2021-01-04T00:10:01.8658215Z ' spanType: Exit\n' +
2021-01-04T00:10:01.8663419Z ' peer: server:5000\n' +
2021-01-04T00:10:01.8665589Z ' skipAnalysis: false\n' +
2021-01-04T00:10:01.8666069Z ' tags:\n' +
2021-01-04T00:10:01.8667797Z " - {key: http.url, value: 'server:5000/axios'}\n" +
2021-01-04T00:10:01.8668501Z " - {key: http.status.code, value: '200'}\n" +
2021-01-04T00:10:01.8669169Z ' - {key: http.status.msg, value: OK}\n' +
2021-01-04T00:10:01.8670012Z ' - segmentId: 6c5f8e313841493d9ea2087125026d28\n' +
2021-01-04T00:10:01.8670546Z ' spans:\n' +
2021-01-04T00:10:01.8671068Z ' - operationName: /axios\n' +
2021-01-04T00:10:01.8671593Z ' operationId: 0\n' +
2021-01-04T00:10:01.8672117Z ' parentSpanId: -1\n' +
2021-01-04T00:10:01.8672588Z ' spanId: 0\n' +
2021-01-04T00:10:01.8673070Z ' spanLayer: Http\n' +
2021-01-04T00:10:01.8673580Z ' startTime: 1609718849161\n' +
2021-01-04T00:10:01.8674079Z ' endTime: 1609718849163\n' +
2021-01-04T00:10:01.8674721Z ' componentId: 49\n' +
2021-01-04T00:10:01.8675224Z ' isError: false\n' +
2021-01-04T00:10:01.8675720Z ' spanType: Entry\n' +
2021-01-04T00:10:01.8676224Z ' peer: localhost:5001\n' +
2021-01-04T00:10:01.8676754Z ' skipAnalysis: false\n' +
2021-01-04T00:10:01.8677207Z ' tags:\n' +
2021-01-04T00:10:01.8677796Z " - {key: http.url, value: 'localhost:5001/axios'}\n" +
2021-01-04T00:10:01.8678460Z " - {key: http.status.code, value: '200'}\n" +
2021-01-04T00:10:01.8679071Z ' - operationName: /axios\n' +
2021-01-04T00:10:01.8679590Z ' operationId: 0\n' +
2021-01-04T00:10:01.8680108Z ' parentSpanId: -1\n' +
2021-01-04T00:10:01.8680573Z ' spanId: 1\n' +
2021-01-04T00:10:01.8681049Z ' spanLayer: Http\n' +
2021-01-04T00:10:01.8681554Z ' startTime: 1609718849164\n' +
2021-01-04T00:10:01.8682053Z ' endTime: 1609718849223\n' +
2021-01-04T00:10:01.8682551Z ' componentId: 4005\n' +
2021-01-04T00:10:01.8683035Z ' isError: false\n' +
2021-01-04T00:10:01.8683518Z ' spanType: Exit\n' +
2021-01-04T00:10:01.8684004Z ' peer: server:5000\n' +
2021-01-04T00:10:01.8684527Z ' skipAnalysis: false\n' +
2021-01-04T00:10:01.8684990Z ' tags:\n' +
2021-01-04T00:10:01.8685563Z " - {key: http.url, value: 'server:5000/axios'}\n" +
2021-01-04T00:10:01.8686297Z " - {key: http.status.code, value: '200'}\n" +
2021-01-04T00:10:01.8686968Z ' - {key: http.status.msg, value: OK}\n'
2021-01-04T00:10:01.8687338Z }

@tom-pytel
Copy link
Contributor Author

But the double spans (http + axios) for outgoing requests are gone no?

I'd say it is worse, the duplicated spans are gone but the segments are duplicated now.

Actually I see what the problem is and it is not Axios (at least not directly). First of all, since Axios is async the test case client.ts should be creating an async request handler function and awaiting it (like the server.ts does correctly). The second and more important part is that the http plugin does not handle async request handlers AT ALL, so need to add that.

@tom-pytel
Copy link
Contributor Author

I am using a slightly condensed debug log for easier reading, have a look at both the client and server below and notice the first 4 lines of each, that is the entry span being created and closed BEFORE the async exit spans even get executed. That segment close needs to be deferred via await until the async http request handler terminates.

client.ts:

18506 725008eaa7574e4e8ee4d3f495ba1f7e 45 C.newEntrySpan: [] 0
18506 725008eaa7574e4e8ee4d3f495ba1f7e 45 C.start: 0
18506 725008eaa7574e4e8ee4d3f495ba1f7e 45 C.stop: 0
18506 725008eaa7574e4e8ee4d3f495ba1f7e 45 C.segment-finished
18506 725008eaa7574e4e8ee4d3f495ba1f7e 53 C.newExitSpan: [] 1
18506 725008eaa7574e4e8ee4d3f495ba1f7e 53 C.start: 1
18506 725008eaa7574e4e8ee4d3f495ba1f7e 53 C.newExitSpan: [ 1 ] 1
18506 725008eaa7574e4e8ee4d3f495ba1f7e 53 C.async: 1
18506 725008eaa7574e4e8ee4d3f495ba1f7e 77 C.resync: 1
18506 725008eaa7574e4e8ee4d3f495ba1f7e 68 http stop: 15467.935335000046
18506 725008eaa7574e4e8ee4d3f495ba1f7e 68 C.stop: 1
18506 725008eaa7574e4e8ee4d3f495ba1f7e 68 C.segment-finished

server.ts

18476 3a638dea53e145599075e9ddaaa369e2 50 C.newEntrySpan: [] 0
18476 3a638dea53e145599075e9ddaaa369e2 50 C.start: 0
18476 3a638dea53e145599075e9ddaaa369e2 50 C.stop: 0
18476 3a638dea53e145599075e9ddaaa369e2 50 C.segment-finished
18476 3a638dea53e145599075e9ddaaa369e2 58 C.newExitSpan: [] 1
18476 3a638dea53e145599075e9ddaaa369e2 58 C.start: 1
18476 3a638dea53e145599075e9ddaaa369e2 58 C.newExitSpan: [ 1 ] 1
18476 3a638dea53e145599075e9ddaaa369e2 58 C.async: 1
18476 3a638dea53e145599075e9ddaaa369e2 80 C.resync: 1
18476 3a638dea53e145599075e9ddaaa369e2 71 http stop: 20777.711658000015
18476 3a638dea53e145599075e9ddaaa369e2 71 C.stop: 1
18476 3a638dea53e145599075e9ddaaa369e2 71 C.segment-finished

@tom-pytel
Copy link
Contributor Author

tom-pytel commented Jan 4, 2021

There we go, segment finishes after everything else:

92096 87a08e834abe44b881c20a55387d0e14 42 C.newEntrySpan: [] 0
92096 87a08e834abe44b881c20a55387d0e14 42 C.start: 0
92096 87a08e834abe44b881c20a55387d0e14 50 C.newExitSpan: [ 0 ] 1
92096 87a08e834abe44b881c20a55387d0e14 50 C.start: 1
92096 87a08e834abe44b881c20a55387d0e14 50 C.newExitSpan: [ 0, 1 ] 1
92096 87a08e834abe44b881c20a55387d0e14 50 C.async: 1
92096 87a08e834abe44b881c20a55387d0e14 70 C.resync: 1
92096 87a08e834abe44b881c20a55387d0e14 54 C.stop: 0
92096 87a08e834abe44b881c20a55387d0e14 61 C.stop: 1
92096 87a08e834abe44b881c20a55387d0e14 61 C.segment-finished

Also threw in a fix for http server peer tag for free :)

@tom-pytel
Copy link
Contributor Author

Maybe error due to changed peer.

@tom-pytel tom-pytel closed this Jan 4, 2021
@tom-pytel tom-pytel reopened this Jan 4, 2021
@tom-pytel
Copy link
Contributor Author

Note, there is still an out of order span stop between the async axios request and async http handler, so not quite done, looking.

@tom-pytel
Copy link
Contributor Author

Well, it turns out that stopIfNotStopped() at the end of http request response event handler we removed before to fix timings was there for another important reason - order of span stop. The new stop mechanism res.on('end', stopIfNotStopped) seems actually to be the best way of doing it and gives even better timing consistency than without it.

@tom-pytel
Copy link
Contributor Author

So have a look, if you update the expected data for the tests I think this is better than ever. Wound up fixing something we didn't even know was a problem - lack of http async server handlers. But take these post-release problems as a demonstration of how important regression tests will be for this agent. Async programming can be tricky and it is very easy to break something, even with minor changes. I suggest you add a lot of tests to this agent, especially multiple concurrent requests like my test example. If you do not, I guarantee you the next person who adds a plugin and is not familiar with the inner workings of this will break everything.

@kezhenxu94
Copy link
Member

@tom-pytel I'm finding a method to test more than simple cases we have now, do you have more tests want to complement except for the tests given above?

@kezhenxu94 kezhenxu94 merged commit ad95f72 into apache:master Jan 5, 2021
@tom-pytel
Copy link
Contributor Author

@tom-pytel I'm finding a method to test more than simple cases we have now, do you have more tests want to complement except for the tests given above?

My test is more of a quick hack, a more thorough test would be something like the following:

  • For each possible type of entry span make sure you execute several dozen of them concurrently on the server module you are testing.
  • For each instance entry span make several dozen exit spans using every possible exit span type.
  • Each exit span type should go to target URLs which return a success, some error urls (404 and 500 specifically) as well as ignore by path and ignore by extension.
  • Not all of these exit paths have to be remote, to save time many of them can be simple local responders, but some remote targets must be included.

The difficulty with something like this is the order of execution is guaranteed non-deterministic so checking output is tricky. But if you have this test you will very very safe against inadvertent breakage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants