Add Resource to ShouldSample. #1658

Oberon00 · 2021-04-28T11:17:41Z

Fixes #1588

Changes

Add Resource as Sampler argument.

specification/trace/sdk.md

spec-compliance-matrix.md

bogdandrutu · 2021-05-25T15:16:52Z

@Oberon00 let's split this into 2 PRs to allow progress:

that adds the InstrumentationLibrary which we've already agreed to support sampling by instrumentation library, then when we find a very good use-case for Resource we can discuss, before that point as pointed in the issue an implementation can be done anyway.
The second PR will be when we have good use-case for Resource and prove that the alternative does not work.

carlosalberto · 2021-09-16T12:40:53Z

I'm against this change. Anyone who wants to write a sampler that uses the Resource can provide a constructor (or whatever is language-equivalent) for that Sampler and it can use the Resource however it wants. I'd rather not add more stuff to the sampler API that we need, and I don't see how this is a necessary addition.

This (written by @jkwatson ) was the general feeling during a SIG call a few weeks ago (hence the comment), namely: do not add stuff to the API at all, unless absolutely necessary.

@bogdandrutu would you mind answering the latest's answers on this topic?

Oberon00 · 2021-09-16T13:00:21Z

I agree that adding stuff to the API (the SDK's plugin API in this case) should be minimized generally. But one goal behind this is IMHO keeping the implementation flexible, allowing later changes, by not exposing too many internals.
But in this case, by not providing the Resource, we are forcing sampler developers that need the resource to make the assumption that the resource is something static that will always be the same for all spans that the sampler processes.

Crucially, this contradicts the fact that the readable span ("SpanData") has the Resource as a public property, which would allow the assumption that different Spans provided to the same SpanProcessor (exporter) can have different resources.
In effect, exporters need to go through the trouble of preparing for distinct resource objects across a batch or different batches, while samplers (configured on the same level as spanprocessors) have to be coded with the limitation that all spans have the same resource. It follows, that at least one of the following is currently broken:

A sampler that needs to know the Resource of the Span it samples
Different resources within the same TracerProvider (see comment above Add Resource to ShouldSample. #1658 (comment))
Reusing the same sampler instance across different TracerProviders

I think the last one doesn't matter much (though we seem to support it for spanprocessors/exporters). But the first and second one are a problem.

bogdandrutu · 2021-09-16T13:30:56Z

@Oberon00 I think exposing the Resource in the SpanData was a mistake that hurts a lot the performance having to do the mapping for OTLP. So I would not repeat that mistake, also the reasoning was because SpanProcessor can be shared across multiple TraceProviders which in practice (never seen this case) could have been avoided and share exporter if needed.

So I think as explained before unless we prove with clear real code example that this does not work we should not do it. And by real code I need to see the project that tries to do something with resource and cannot because of this problem.

bogdandrutu · 2021-09-16T13:34:53Z

Also the example and the motivation that an object is shared between multiple instances is not a good reason to add unnecessary APIs that will limit performance, see #1658 (review)

bogdandrutu · 2021-09-16T13:41:53Z

Hmm - it's hard to word it, but I think that there indeed may only be one precise use case. It is the concept of "sampling by attribute" - the attributes for a span are both what's set in its builder and its Resource I think. So we could never consider attribute-based sampling complete without the Resource. We currently have a PoC of attribute-based sampling for Java

open-telemetry/opentelemetry-java-contrib#70

which I would suspect to be spec'd in some form some day. Currently this sampler can't look at Resource but it seems like it should be able to.

@anuraaga please give me a real example when sampling by resource attributes is useful in practice.

I've seen extremely limited number of Applications with more than one Provider (personally 0, but let's say they exists) and usually in all these cases I expect that the attributes that are different in the Resource to be the "service" (or at least that was the motivation before to support multiple Providers) and if that is the case you can simply just ignore the resource attributes and set different Samplers per service instance of the Provider.

Oberon00 · 2021-09-16T14:38:17Z

I think exposing the Resource in the SpanData was a mistake that hurts a lot the performance having to do the mapping for OTLP. So I would not repeat that mistake, also the reasoning was because SpanProcessor can be shared across multiple TraceProviders which in practice (never seen this case) could have been avoided and share exporter if needed.

I agree with that actually. I think exporters and span processors and samplers should only ever used with a single TracerProvider (span processor). There is also #508 open which would change how resources are passed (probably we need to close that one as its too late).

But now we have the Resource in the SpanData and I feel we already have reached consensus that #1298 is worthwhile to implement. If we pass the resource to the sampler, we can just start passing the new resource with the new attributes.
If the resource is given to the sampler outside the SDK's control, we now need to have a callback like OnResourceChanged. And the sampler developers would need to implement that callback to update their resource. Probably the callback could also fire at initial resource creation... Then we have the same features for samplers that this PR would provide plus much more flexibility but also some complexity.

@anuraaga What do you think of adding an OnResourceChanged callback once we implement #1298? Would that improve usability?

anuraaga · 2021-09-16T14:47:55Z

@anuraaga What do you think of adding an OnResourceChanged callback once we implement #1298? Would that improve usability?

I think if the SDK provided a dependency injection mechanism like this (presumably the callback would be called on the first resource too so in practice a way to inject Resource) then that solves the usability issue too. It seems like a bigger change and I'd be worried such a change would get blocked for some other reason, but it seems like the only think to try.

bogdandrutu · 2021-09-16T15:18:45Z

@Oberon00 if we were to do a breaking change (since this one is) I would suggest to better investigate the path where all of these components are associated with only one Resource and maybe have a hook when the "Resource" changes to support late Resource resolution :)

This is my thought, I would better invest my time in doing something that we know is more performant and can improve things than adding more "simple" but not optimal changes.

Oberon00 · 2021-09-16T15:22:00Z

if we were to do a breaking change (since this one is)

We have already "broken" this interface for 1.7 because we added the InstrumentationLibrary, which is why we are discussing this PR now.

jmacd · 2021-09-16T21:24:44Z

Could we resolve this PR with an additional statement that language SIGs MAY provide optimized Sampler interfaces that bind certain arguments such as Resource and InstrumentationLibrary when they are known to be constant? That way we allow this PR in but @bogdandrutu's reservations are met, and we don't block #1298 (because I wrote "when they are known to be constant", any optimized API the language SIGs create will have to contend with 1298 when the time comes).

Oberon00 · 2021-09-17T08:03:10Z

How could that optimized interface look like?

jmacd · 2021-09-17T11:20:32Z

There is already language in the SDK specification that offers SDKs a lot of freedom in how TraceID and SpanIDs are generated, I'm referring to "When asked to create a Span, the SDK MUST act as if doing the following in order:"

Step 2 is "Query the Sampler's ShouldSample method".

I imagine the specification just saying what the act of calling ShouldSample does, then we shouldn't have to specify any alternate APIs. The optimized Sampler API MUST act the same as the non-optimized Sampler API, that's all.

In Go, ShouldSample takes a struct with some possibly-constant fields. The optimized Sampler API would provide a factory method for binding constant fields which returns the former Sampler implementation.

Oberon00 · 2021-09-17T12:18:50Z

I think @bogdandrutu has switched arguments over the course of the review: Originally, the suggestion was to add a more complex family of interfaces in order to gain some performance (#1658 (comment)). Now, the main issue with this PR seems to be that we should keep the API minimal and not add unneeded things (#1658 (comment)).
I think adding more implementation options would actually be worse in that regard.

jmacd · 2021-09-17T15:02:25Z

I believe the underlying problem is that we do not have a data model for Resources.

Oberon00 · 2021-09-17T16:00:58Z

I believe the underlying problem is that we do not have a data model for Resources.

Hmm, we do have a data model for the resources themselves IMHO, but the current solution of how they are connected to the actual telemetry is unsatisfying. E.g. you pass it to a tracer provider and separately to a meter?

jmacd · 2021-09-17T19:10:31Z

I believe the underlying problem is that we do not have a data model for Resources.

Hmm, we do have a data model for the resources themselves IMHO

I apologize, I over-stated that. I meant to say the data model we have is not strong enough, but that's a topic we should continue discussing in #1298. The resource SDK spec says:

A Resource is an immutable representation of the entity producing telemetry as Attributes.

We haven't made any requirements about the "entity" described there, and it means everyone has a different opinion about what this PR means.

the current solution of how they are connected to the actual telemetry is unsatisfying

Yes. Also, as with #1298, the resource data model is not strong enough to accommodate Prometheus and OpenMetrics (#1298 (comment)).

In #1588 you wrote:

Suppressing unwanted spans depending on the instrumentation library name

Let me suggest that the problem is you're trying to use a Sampler API to suppress Spans. When DynaTrace originally introduced the Named Tracer and instrumentation library concept, it was stated that an SDK may decide to suppress spans based on their library. I.e., it's the Tracer's responsibility to suppress spans.

I've made statements about how treating Sampler as a means of filtering spans is causing confusion which is an obstacle for probability sampling here: #1844 (comment)

If your goal is to suppress spans based on instrumentation library, that should happen before the Sampler is called. If your goal is to suppress spans based on resource, you can likewise do that before the Sampler is called. If your goal is to setup sampling rules that are applied on a per-instrumentation library basis, we should be specifying a View Configuration mechanism for spans that will configure a per-library Sampler (for example).

Based on this above reasoning, I'm on the fence about whether InstrumentationLibrary belongs in the sampler API at all. The reason InstrumentationLibrary is different than Resource, and why it was easier to accept, is that InstrumentationLibrary is at least a variable across the "entity producing telemetry", whereas Resource is not.

jmacd

I'm removing my approval, now that I've understood the motivation better. If the goal is to suppress spans based on Resource (or Instrumentation Library), better to do that before the Sampler is called.

Oberon00 · 2021-09-17T21:18:19Z

If your goal is to suppress span ... better to do that before the Sampler is called.

That's not really possible, the sampler is the only hook that is exposed for that purpose. Replacing the TracerProvider with a custom implementation is usually not practical (one reason being that Resource APIs are only defined in the "official" SDK).

anuraaga · 2021-09-18T01:22:14Z

If the goal is to suppress spans based on Resource (or Instrumentation Library), better to do that before the Sampler is called.

At least for me the goal is to change sample rate based on service name, or k8s pod name, or hostname, type of resource attributes. Often this is something like seeing metrics report failure on one service or host and wanting to crank up the sampling rate for that service temporarily.

This works when sampling configuration is managed centrally, for example with X-ray remote sampling, or even if it just reads a file from etcd.

So attributes in the resource seem useful to have available when configuring sampling. I guess instrumentation library may be more geared to suppression, not sure but not resource.

iNikem · 2021-09-18T08:17:28Z

If your goal is to suppress span ... better to do that before the Sampler is called.

That's not really possible, the sampler is the only hook that is exposed for that purpose. Replacing the TracerProvider with a custom implementation is usually not practical (one reason being that Resource APIs are only defined in the "official" SDK).

@jmacd I have to agree with @Oberon00 here: where else can we suppress spans (and how) if not by calling Sampler and returning SamplingDecision.DROP?

bogdandrutu · 2021-09-20T16:40:29Z

@anuraaga

At least for me the goal is to change sample rate based on service name, or k8s pod name, or hostname, type of resource attributes. Often this is something like seeing metrics report failure on one service or host and wanting to crank up the sampling rate for that service temporarily.

This works when sampling configuration is managed centrally, for example with X-ray remote sampling, or even if it just reads a file from etcd.

So attributes in the resource seem useful to have available when configuring sampling. I guess instrumentation library may be more geared to suppression, not sure but not resource.

All these attributes you know them from outside (in case of a remote config) so you can decide which config to send to each pod/service instead of having a configuration for each attribute in the same file.

Let's get k8s pod name, are you suggesting that you will have 10 entries (different sampling rates for different services)? What about instead ask the remote service to tell your config and pass the Resource then?

anuraaga · 2021-09-20T23:22:58Z

What about instead ask the remote service to tell your config and pass the Resource then?

This seems like a nice design, especially for an OTel configuration service API. AFAIK current remote sampling APIs can't accept a Resource so this approach doesn't seem to support them well. And I think for medium size teams with a dozen or so microservices on on-prem hosts, a single file pushed through Ansible or such seems reasonable - I've done similar for experiment configs before, not specifically sampling. I hope OTel doesn't just assume these are bad use cases.

A span has many attributes, some of them happen to be on the span, others on the Resource. It seems inconsistent to only provide one of these when sampling though.

Oberon00 · 2021-09-22T10:36:21Z

I will close this PR if/when #1942 is merged.

Oberon00 · 2021-09-22T14:27:35Z

Closing due to merge of #1942.

jmacd · 2021-09-22T14:29:40Z

It seems inconsistent to only provide one of these when sampling though

We've made an argument that performance suffers when every Sampler invocation considers the resource or instrumentation library.

It sounds like you're asking for a standard mechanism. In the Go SDK, what you're asking for is two new interfaces:

type SDKSampler interface {
  SamplerForResource(resource.Resource) InstrumentationLibrarySampler
}

type InstrumentationLibrarySampler interface {
  SamplerForLibrary(instrumentation.Library) Sampler
}

Since all of the machinery implied by these two new interfaces is part of setting up and configuring an SDK, it falls outside the scope of OTel's current specification. You could build these and use them, and have what you want.

Should OpenTelemetry offer these interfaces? Users want configurable tracer SDKs, so I think the answer is yes.

For your simple application of a file containing resource and library parameters for sampling configuration, you can:
(1) parse the intended resource attribute to sampler settings map
(2) inspect the resource of the current SDK
(3) apply all resource attributes to the predicates used in the mapping
(4) some of the sampler settings mapped in your resource file are unreachable, their predicates are already false if resources do not match
(5) configure only the non-false predicates as the SDKSampler

the SamplerForLibrary method does roughly the same, evaluate any remaining predicates with the instrumentation library details, now the returned Sampler does exactly what you want and uses the current API -- the decisions about instrumentation library and resource are pre-bound by the time a Span arrives, and no further computation has to be done for any of the resource or library predicates.

Add InstrumentationLibrary and Resource to ShouldSample.

235bcaa

Oberon00 requested review from a team April 28, 2021 11:17

github-actions bot assigned bogdandrutu Apr 28, 2021

Fill in PR#.

c0115ac

iNikem approved these changes Apr 28, 2021

View reviewed changes

iNikem reviewed Apr 28, 2021

View reviewed changes

specification/trace/sdk.md Show resolved Hide resolved

iNikem self-requested a review April 28, 2021 12:10

carlosalberto approved these changes Apr 28, 2021

View reviewed changes