Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate sampling_priority #248

Merged
merged 8 commits into from
Nov 14, 2017
Merged

Conversation

ufoot
Copy link
Contributor

@ufoot ufoot commented Nov 10, 2017

Follow-up to #245

Most significant change here is that context is used to propagate the trace_id, parent_id and sampling_priority. This makes it more alike the Python implementation and allows more code sharing.

@ufoot ufoot requested review from p-lambert and palazzem November 10, 2017 21:33

ctx ||= call_context
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ that call_context was creating buggy behaviors in corner cases, it broke some tests as I implemented the id propagation using the context. Those tests were fixed when removing this call and creating a fresh context when none is given. Which is what we do in Python BTW.

@@ -264,7 +264,7 @@ def test_start_span_child_of_context
thread = Thread.new do
mutex.synchronize do
@thread_span = tracer.start_span('a')
@thread_ctx = tracer.call_context
@thread_ctx = @thread_span.context
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nasty, but yes, getting the span context is the right thing to do here.

@sampled = false
@parent_trace_id = options.fetch(:trace_id, nil)
@parent_span_id = options.fetch(:span_id, nil)
@sampled = options.fetch(:sampled, false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mixed here. Python implementation uses true by default. We could change this, it possibly breaks some internal tests, but none of them should impact behavior from a user POV. AFAIK client sampling is not really massively used today in Ruby, and as a side note, first span added would turn this to true. But I think it's worth noticing.

def sampling_priority
hdr = header(HTTP_HEADER_SAMPLING_PRIORITY)
# It's important to make a difference between no header,
# and a header defined to zero.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required because some hosts might have priority sampling disabled and/or have outdated libraries so they are not propagating the information. In this case, we want to just ignore it and not set it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@p-lambert p-lambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping me a lot with all these changes! My comments are mostly about following the principle of single responsibility in the DistributedHeaders. Since we're designing new stuff, I think it's worth discussing these topics.

def sampling_priority
hdr = header(HTTP_HEADER_SAMPLING_PRIORITY)
# It's important to make a difference between no header,
# and a header defined to zero.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

headers[HTTP_HEADER_SAMPLING_PRIORITY] = span.sampling_priority.to_s
end
env.merge! headers
env.delete(HTTP_HEADER_SAMPLING_PRIORITY) unless span.sampling_priority
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this #delete?

Copy link
Contributor Author

@ufoot ufoot Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it on purpose indeed, my sense was that, if the headers say: there's no sampling priority, it should actually be removed. After all that function is named inject! so it sort of implies: take whatever step is needed so that this env, afterwards, reflects the state of the headers passed to it, including the fact that if it has no sampling priority, then this field should just not exist at all. It's really a corner case I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

env.delete(HTTP_HEADER_SAMPLING_PRIORITY) unless span.sampling_priority
end

def self.extract(env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More a design nitpick, but I'd prefer to do less things here and keep DistributedHeaders more like as presenter around the environment object. I think the concept of a Context should belong to somewhere else (probably to whoever call this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, my sense is that this probably, indeed, does not belong to something like distributed_headers, true enough. How about, then, renaming it to, say, propagation/http as in https://github.com/DataDog/dd-trace-py/blob/master/ddtrace/propagation/http.py Because, it's of general usage, it's proven useful in Python (extracting and injecting those values from headers to hash is quite a straightforward use case and many libraries may benefit from it, even if today only Faraday uses it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I'd prefer to defer things like this until they're proven useful, but I'm pretty sure you can anticipate more use-cases and scenarios for it than I do, so I'm fine with it.

@finished_spans = 0
@current_span = nil
end

def trace_id
@mutex.synchronize do
return @parent_trace_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind changing this, let's get rid of the return just to be more idiomatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -37,11 +64,21 @@ def current_span
end
end

def set_current_span(span)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

value
end

def self.inject!(span, env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is basically related to the faraday integration I would prefer to keep it there. I don't think we should add methods to the public API of this class unless it is reusable for multiple entities.

require 'ddtrace/span'
require 'ddtrace/distributed_headers'

class DistributedHeadersTest < Minitest::Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Sorry for leaving this behind!

@ufoot ufoot modified the milestones: 0.9.2, 0.10.0 Nov 14, 2017
@ufoot ufoot merged commit 19c2fb4 into master Nov 14, 2017
@palazzem palazzem deleted the christian/sampling_priority_propagation branch November 14, 2017 17:30
@palazzem palazzem added the core Involves Datadog core libraries label Nov 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants