-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Getter's Keys operation. #825
Add Getter's Keys operation. #825
Conversation
Context extraction isn't exactly compute intensive but a lot of context carriers don't support random access. So if you want to implement this specification for e.g. Kafka client you will need to pass over the headers once to materialise the keys, then you have a quadratic pass extracting the values. The headers are almost always going to be quite small, but it would just be a lot lighter-weight to provide an API based on iterators over key-value pairs e.g. |
Good point. Couple points here - context extraction is on the hot path of any telemetry system and context is extracted even when telemetry is sampled out. So even small optimization here are important. On some systems well known headers are stored in a dedicated variables for faster access. So completely switching to iterator approach may hurt performance. I want to understand the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments and please add more information about the use case to spec why this thing is needed
Couldn't agree more.
That's fine if the carrier supports random access; many don't even if they present an API like |
@richardstartin with the introduction of a new accessor for the fields, unless we will have technology-specific propagators, propagators would use one accessor or another. In order to address the issue you are bringing up, we actually need to have separate propagators, depending on the underlying technology to access fields in the most efficient way. Are you advocating for this? If so, this PR may not be enough to solve this issue. |
You have to do this anyway. When you get in to the gritty details, hardly any context carriers are nice generic types and they need to be handled on a case by case basis. I presume the challenge here is to find an abstraction which models the most context carriers in the least bad way. Concretely, what I'm advocating for is that the context extractor should have an interface like: Iterable<Pair<String, String>> contextEntries() as opposed to Iterable<String> keys()
String get(String key); Just to reiterate, since some codecs mandate case-insensitivity, the extractor implementation can't be better than linear time, but what's being proposed here is quadratic time unless the underlying context carrier supports random access, and many implementations don't. |
I don't quite agree that we are looking for an "abstraction which models the most context carriers in the least bad way". Telemetry may not be the only think that reads header. And if platform is optimized to read headers once and store well-known values in a predefined variables, getter approach doesn't induce any additional iteration. As we are advocating to integrate telemetry into software, this pattern may become dominant over time. System that only support iterations may implement a proxy layer that iterates once and stores variables for getter to get them. Which almost as good as a single iteration inside the propagator. If we switch everything to iterators, we are loosing this optimization. So I'm looking to understand the scenario when propagators have to iterate over key/values. |
@SergeyKanzhelev sorry I think I may have jumped ahead because I am already familiar with the problem this proposal aims to solve and also the implementation challenges it presents. MotivationWhy might
Get by key doesn't support either of these extraction use cases, hence the need for How would context extraction look with proposal?A specific implementation would iterate over the keys, match them against known values, and get the values the implementation is interested in. With prefix based baggage, this might look like what's below. MyContext extract(C carrier, Getter<C> getter) {
MyContext extracted = new MyContext();
for (String key : getter.keys(carrier) {
if ("trace-id".equalsIgnoreCase(key)) {
extracted.traceId = getter.get(carrier, key);
} else if ("parent-id".equalsIgnoreCase(key) {
extracted.parentId = getter.get(carrier, key);
} else if (key.toLowerCase().startsWith("baggage-") {
extracted.baggage.put(key.substring("baggage-".length(), getter.get(carrier, key);
}
}
} Shortcomings of this proposalUnless Let's look at the API of one possible context carrier, Kafka client public interface Headers extends Iterable<Header> {
// irrelevant methods removed
/**
* Returns just one (the very last) header for the given key, if present.
*
* @param key to get the last header for.
* @return this last header matching the given key, returns none if not present.
*/
Header lastHeader(String key);
} For reference, here's the actual implementation of @Override
public Header lastHeader(String key) {
checkKey(key);
for (int i = headers.size() - 1; i >= 0; i--) {
Header header = headers.get(i);
if (header.key().equals(key)) {
return header;
}
}
return null;
} The only plausible way to implement public class KafkaHeadersGetter implements Getter<Headers> {
@Override
public Iterable<String> keys(final Headers headers) {
final List<String> keys = new ArrayList<>();
for (final Header header : headers) {
keys.add(header.key());
}
return keys;
}
@Override
public String get(final Headers headers, final String key) {
final Header header = headers.lastHeader(key);
return new String(header.value(), StandardCharsets.UTF_8);
} This makes the loop above quadratic in the number of headers. For some carriers, e.g. RabbitMQ just uses a Alternative for supporting case insensitivity/prefix matchesGiven that there are prefix based and case insensitive codecs out there, I would suggest one of the least bad options, assuming a limited complexity budget, is actually iterating over key value pairs. I will illustrate with the Kafka example
interface Getter<C> {
Iterable<Pair<String, String>> contextValues(C carrier);
} The loop above for extracting the values above becomes: MyContext extract(C carrier, Getter<C> getter) {
MyContext extracted = new MyContext();
for (Pair<String, String pair : getter.contextValues(carrier) {
if ("trace-id".equalsIgnoreCase(pair.getKey())) {
extracted.traceId = pair.getValue();
} else if ("parent-id".equalsIgnoreCase(pair.getKey()) {
extracted.parentId = pair.getValue();
} else if (pair.getKey().toLowerCase().startsWith("baggage-") {
extracted.baggage.put(pair.getKey().substring("baggage-".length(), pair.getValue());
}
}
} Which is now guaranteed to be linear in the headers in the carrier. The Kafka public class KafkaHeadersGetter implements Getter<Headers> {
@Override
public Iterable<Pair<String, String>> contextValues(final Headers headers) {
final List<Pair<String, String>> contextValues = new ArrayList<>();
for (final Header header : headers) {
contextValues.add(Pair.of(header.key(), new String(header.value(), StandardCharsets.UTF_8));
}
return contextValues;
} This is a straw-man example, and could be implemented more efficiently with an iterator, and more efficient still would be to push key matching down to where the public class KafkaHeadersGetter implements Getter<Headers> {
private final Predicate<String> keyPredicate;
@Override
public Iterable<Pair<String, String>> contextValues(final Headers headers) {
return () -> new HeadersIterator(keyPredicate, headers.iterator());
}
private static final class HeadersIterator implements Iterator<Pair<String, String>> {
private final Predicate<String> keyPredicate;
private final Iterator<Header> headers;
private Header next;
public (Predicate<String> keyPredicate, Iterator<Header> headers) {
this.keyPredicate = keyPredicate;
this.headers = headers;
}
@Override
public Pair<String, String> next() {
if (null == next) {
throw new NoSuchElementException();
}
return Pair.of(next.key(), new String(next.value(), StandardCharsets.UTF_8);
}
@Override
public boolean hasNext() {
this.next = null;
while (headers.hasNext() {
Header next = headers.next();
if (keyPredicate.test(next.key()) {
this.next = next;
return true;
}
}
return false;
}
} In summary, I think it's important to consider what constraints a specification might impose on the implementations, and the edge case consequences of this proposal are fairly limiting. |
@richardstartin Thanks for this detailed example! A couple things came to mind
I think the vast majority of libraries support random access to headers so lean towards dealing with it in instrumentation but interested in hearing any thoughts! |
Agreed. Besides, even though a full Iterator pattern would help Kafka, it would not exactly improve many other libraries that expose the headers as a collection that can be accessed through a Alternatively, we could offer instead of a |
Ping @yurishkuro @tylerbenson ;) |
Sadly not - Kafka's just one example. Here's Akka HTTP, a linear search: def getHeader(headerName: String): Optional[jm.HttpHeader] = {
val lowerCased = headerName.toRootLowerCase
Util.convertOption(headers.find(_.is(lowerCased))) // Upcast because of invariance
} Supports single pass iteration over all the pairs though. Grizzly public DataChunk getValue(String name) {
for (int i = 0; i < count; i++) {
if (headers[i].getName().equalsIgnoreCase(name)) {
return headers[i].getValue();
}
}
return null;
} Supports single pass iteration via But I think I can let this go at this point :) |
@SergeyKanzhelev feedback applied. Please re-review. |
@richardstartin in these technologies, do we need an iterator over key/value pairs or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'm seeking for additional feedback on whether it is the right abstraction and whether we need iterator over Keys
or key-value pairs.
I've stated my case - this abstraction leads to accidentally quadratic loops on the critical path, albeit over small inputs. |
Co-authored-by: Sergey Kanzhelev <[email protected]>
@tylerbenson @bogdandrutu @yurishkuro Any preference regarding |
I don't have a strong preference, but there may be performance considerations where one is more efficient than the other depending on the language. I wouldn't mind leaving it up to the language to define which form is more efficient and idiomatic. If we don't want that, then I would go with |
@yurishkuro Sounds great. So will do a follow-up to mention that the language implementations are free to return the keys or keys/values in the iterator. |
@open-telemetry/specs-approvers Please review/approve this PR ;) |
This PR is coming from the need to support the "uber-" keys. If we cannot find enough approvers for the PR - it may be indicative of the fact that nobody needs this functionality in OpenTelemetry 1.0. @yurishkuro what would be blocked without this support? Maybe making the downside of not having this merged would convince more people to review and approve. |
Without iterator on keys the existing Jaeger context propagation format cannot be implemented. |
* Add of Getter's Keys operation. * Apply feedback. * Update specification/context/api-propagators.md Co-authored-by: Sergey Kanzhelev <[email protected]> * Remove extra trailing space. Co-authored-by: Sergey Kanzhelev <[email protected]>
Fixes #713
Addresses part of #433
Changes
Getter now has an additional
Keys()
operation in order to get all the keys in a given carrier. It will help support variable key names formats, such as the Jaeger one (as it has variableuberctx-
entries).The change is relatively trivial, as it's a simple addition. Most existing
Propagator
s won't need to update its behavior, and only a few ones (such as theOT
or theJaeger
one) can use the additionalKeys()
operation to do additional work.One thing to pay attention to is the existing
HttpTextPropagator.fields()
operation, which is kept but a note is added, mentioning it only includes predefined fields.Wondering if we should use either
fields
orkeys
for both mentioned operations, and have a more homogeneous experience.Related PR shown as prototype: open-telemetry/opentelemetry-java#1549
cc @tylerbenson @yurishkuro @bogdandrutu