Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy doesn't send the full Zipkin Span context for a receiving node in an inter-service call #1414

Closed
clguiman opened this issue Aug 8, 2017 · 12 comments
Labels
area/tracing enhancement Feature requests. Not bugs or questions.

Comments

@clguiman
Copy link
Contributor

clguiman commented Aug 8, 2017

Using the following configuration:
service-a-1 sits behind an envoy instance and service-b-3 sits behind another envoy instance.
service-a-1 makes an http request to service-b-3 and as a result of that, the Zipkin drivers will send two spans back (one for each envoy):

Span from the service-a-1's envoy:
[{ "traceId": "00008d850394663b", "name": "service-b", "id": "00000bd7e40daf64", "parentId": "00008d850394663b", "timestamp": 1502215619776870, "duration": 159474, "annotations": [{ "timestamp": 1502215619776870, "value": "cs", "endpoint": { "ipv4": "172.18.0.7", "port": 0, "serviceName": "service-a-1" } }, { "timestamp": 1502215619936415, "value": "cr", "endpoint": { "ipv4": "172.18.0.7", "port": 0, "serviceName": "service-a-1" } }], "binaryAnnotations": [{ "key": "node_id", "value": "service-a-1" }, { "key": "guid:x-request-id", "value": "3ba8d358-0c36-9698-8342-ec001c845379" }, { "key": "request_line", "value": "GET / HTTP/1.1" }, { "key": "host_header", "value": "service-b" }, { "key": "downstream_cluster", "value": "service-a-1" }, { "key": "user_agent", "value": "service-a-1" }, { "key": "x-ot-span-context", "value": "00008d850394663b;00000bd7e40daf64;00008d850394663b;cs" }, { "key": "x-envoy-downstream-service-cluster", "value": "service-a-1" }, { "key": "x-envoy-downstream-service-node", "value": "service-a-1" }, { "key": "request_size", "value": "0" }, { "key": "response_code", "value": "200" }, { "key": "response_size", "value": "53" }, { "key": "response_flags", "value": "-" }] }]

Span from the service-b-3's envoy:
[{ "traceId": "00008d850394663b", "name": "", "id": "00000bd7e40daf64", "parentId": "00008d850394663b", "annotations": [{ "timestamp": 1502215619777207, "value": "sr", "endpoint": { "ipv4": "172.18.0.5", "port": 0, "serviceName": "service-b-3" } }, { "timestamp": 1502215619936176, "value": "ss", "endpoint": { "ipv4": "172.18.0.5", "port": 0, "serviceName": "service-b-3" } }], "binaryAnnotations": [] }]

The second span shares the context with the first one and it doesn't have any binary annotations.
Is it possible to configure envoy to send the binary annotations for incoming request as well, and not just for the outgoing request?
This is useful when each envoy send the data to a different Zipkin server that can't correlate the 2 spans.

If this is not currently supported are you wiling to accept a PR that will add the redundant data when a configuration key is present?

@mattklein123 mattklein123 added enhancement Feature requests. Not bugs or questions. area/tracing labels Aug 8, 2017
@mattklein123
Copy link
Member

@clguimanMSFT definitely open to patches here. #1364 also needs attention which may or may not be related. Feel free to propose solutions/PRs for one or both issues. (I'm not a Zipkin expert so don't have much to say right now).

cc @rshriram @fabolive @adriancole

@clguiman
Copy link
Contributor Author

clguiman commented Aug 8, 2017

I dug around the source code, and it seems the decision to skip the binary annotations is by design:
https://github.com/lyft/envoy/blob/0efa18c36d6f789562690b5053c2c4b00987979e/source/common/tracing/zipkin/zipkin_tracer_impl.h#L62

My proposal would be to add an option configuration to the tracer (it's already referenced by all ZipkinSpan objects): shouldAlwaysAddTags which will default to false and then just change this line of code to decide if the tag should be added or not:
https://github.com/lyft/envoy/blob/0efa18c36d6f789562690b5053c2c4b00987979e/source/common/tracing/zipkin/zipkin_tracer_impl.cc#L23

@codefromthecrypt
Copy link
Contributor

I think this logic is more defensive about tags that other tracers. I dont know of any other tracer that does this actually, as it implies (as you noticed) that you will always have data from another node. While there is a chance that data is different between different nodes, having this data can be good when investigating problems. For example, zipkin has a "raw trace" feature which allows folks to see which node saw what value.

Tldr a default of always recording tags is more idiomatic. If we want to reduce tags for size reasons, probably better to pick which ones are most important for client vs server instead of only adding tags for one side

@codefromthecrypt
Copy link
Contributor

Ps thanks for shining a light on this!

@mattklein123
Copy link
Member

If it's more idiomatic to just add the tags I would just fix the code to not block it IMO. But no great opinion if folks think config option is better (could also default config option to always add).

@clguiman
Copy link
Contributor Author

clguiman commented Aug 9, 2017

I don't think the extra size is a problem.
Having the ability to choose which tags to send in this particular case I think just make the configuration more cumbersome, so I'll just a flag for all.
I will still keep it default off so we don't brake an existing configuration that relies on this behavior. I'll submit a PR tomorrow.

@rshriram
Copy link
Member

rshriram commented Aug 9, 2017

Thanks for spotting this. I am with Matt and Adrian. No point in making a config option. Let's just add tags from receiving end as well.
( I think Fabio did this for efficiency reasons and the fact that this was going to run in an istio setup where all proxies send data to same zipkin server. )

@fabolive
Copy link
Contributor

fabolive commented Aug 9, 2017

The reason I refrained from adding the tags on both sending and receiving ends was that the Zipkin server was actually duplicating the binary annotations with the same key (making the UI look messy) rather than properly merging them. I agree that, in principle, always setting the binary annotations is preferable. @adriancole Zipkin should never duplicate binary annotations with the same key. Is the duplication I alluded to a behavior you were aware of?

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Aug 9, 2017 via email

@clguiman
Copy link
Contributor Author

clguiman commented Aug 9, 2017

Here's the PR: #1418

@clguiman
Copy link
Contributor Author

clguiman commented Aug 9, 2017

Thanks for the fast replies and assistance in getting my PR merged!

@clguiman clguiman closed this as completed Aug 9, 2017
@mattklein123
Copy link
Member

Fixed by #1418

nezdolik pushed a commit to nezdolik/envoy that referenced this issue May 4, 2024
We had 2 nearly identical implementations. Thankfully C++ templates
facility lets us produce 2 different runtime functions (for different
type widths) without duplicating source.

Amend github issue envoyproxy#1414
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracing enhancement Feature requests. Not bugs or questions.
Projects
None yet
Development

No branches or pull requests

5 participants