-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTLP #1
OTLP #1
Conversation
d27b436
to
8664419
Compare
* RFC: Remove SpanData * formatting * respond to comments, add related issues * finishing touches * rename file * respond to comments * edit withOption wording * Update 0004-remove-spandata.md Co-Authored-By: Yang Song <[email protected]> * fix typo/github links * Adds language for including span ID, removes it from open questions * respond to comments * Rename 0004-remove-spandata.md to text/0002-remove-spandata.md * Change finish to end * Clarify where the isOutOfBand option passed.
1c46025
to
f0f8cc1
Compare
* Add Makefile and Circle CI config Fixes open-telemetry#13. Currently the only job is misspell check. * Install tools locally to guarantee same version * Make Makefile less ambiguous * Change config file to yml * Top level job should be 'build'
c7a9977
to
224b5d0
Compare
|
||
![Request-Response](images/otlp-request-response.png) | ||
|
||
`Export` request contains telemetry data and includes an ID field that is unique for each request. The value of ID field is a sequential uint64 number generated by the client. `Export` response received from the server contains an ID field that acknowledges the receipt of the request with the same ID value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need an ID? modern protocols have an embedded way to associate requests with response even in an async mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, ID would not be needed if we used a transport that inherently supports matching of requests and responses (such as unary gRPC). However, as far as I know gRPC stream does not have such functionality. All it supports is ordered stream of messages in both directions. It is up to the application to interpret which response corresponds to which request.
One approach when using streaming would be to rely on the order of the requests and responses. However it places unnecessary limitation on server implementations. I want server implementations to be able to send out-of-order responses to Export requests (think for example a thread-pool based server implementation which gets requests from the stream, processes them in a thread pool and sends responses over the stream. In this situation it is difficult to guarantee the order of responses).
|
||
### Client Modes | ||
|
||
OTLP defines 2 modes of operation for clients: Synchronous and Pipelined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the pipelined mode, does it require in case of a service behind an LB that all the requests go to the same instance? Asking to understand if piepline requires biderectional streams or async unary rpcs are enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Load Balancing discussion in this spec is targetted for the case when services behind Load Balancer do not require that all the requests go to the same instance. The goal is to allow Load Balancers rebalance traffic when needed.
So, you are right that unary gRPC would be enough from functional perspective, because unary calls can be rebalanced out of the box.
The problem with unary calls is bad performance. Benchmark results show about 1.12 times more CPU usage of unary calls compared to streaming implementation. This is one of the reasons I decided to go with gRPC streams.
|
||
- `FailedRetryable` - processing of telemetry data failed. The client SHOULD record the error and may retry exporting the same data. This can happen when the server is temporarily unable to process the data. | ||
|
||
#### Throttling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2cents here, is that this is a hard problem to solve and I am not sure this is the right way. Let's assume you have 5 collectors, you need to have a global throttler to know when the other service are also overwhelmed or not.
Probably an easier solution to this is to return a FailedRetryable and implement some kind of backoff retry logic on the client. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2cents here, is that this is a hard problem to solve and I am not sure this is the right way. Let's assume you have 5 collectors, you need to have a global throttler to know when the other service are also overwhelmed or not.
You are right that generally it is a hard problem and there are no solutions that apply to all situations.
However in some scenarios the "retry-after" approach can work well. For example when the same set of clients is connected to the same server and send data to the same server, the server can make approximate prediction about how overloaded it is (for example by looking at the length of its incoming queue that it needs to process and the rate of incoming data) and how much it needs the clients to slow down.
If the server is unable to calculate the throttle period, then it can use exponential backoff logic to calculate throttling period or use some fixed number.
This is analogous to Retry-After header in HTTP: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After and how the server can include it when returning status code 429 to ask client to slow down: https://tools.ietf.org/html/rfc6585#section-4
Probably an easier solution to this is to return a FailedRetryable and implement some kind of backoff retry logic on the client. What do you think?
The problem with this approach is that it is not possible for client to understand the reason why the export request failed. I believe it is important for clients to distinguish general retryable errors from the case when the server is overloaded and asks the client to slow down. This is important for observability purposes (to know why exactly the export is failing).
If you think that having the numeric throttle period in the response is superfluous we can get rid of it and only report a boolean "overloaded" status in the response.
|
||
// A list of spans from a Node. | ||
message NodeSpans { | ||
Node node = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to revisit Node. I think currently because some of the Omnition people pushed me we added some informations like service name which should be part of the Resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to go over Node/Resource/Span/Metric and decide what will be the schema for this. I left it out from this RFC to limit the scope, I can work on it in a separate RFC and link from this one.
text/0000-opentelemetry-protocol.md
Outdated
|
||
OTLP achieves this by periodically closing and reopening the gRPC stream. This behavior is based on the fact that Level 7 load balancers that are gRPC-aware make re-balancing decisions when a new stream is opened. | ||
|
||
After sending an `Export` request the client MUST check how much time passed since the stream was last opened and if it is longer than a configurable period of time the stream MUST be closed and opened again. After opening the stream the client SHOULD perform a `Hello` message exchange. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be implemented on the server? This may be unnecessary overhead if the receiver is otelserv daemon for example. Maybe one of the return property from the server should include a restart stream property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this functionality is not needed it can be disabled on the client-side (by setting timeout period to infinity).
We can recommend to disable rebalancing in client libraries by default, because they are most likely sending to local daemon.
Agents and Collectors will have this enabled by default.
rpc Hello(HelloRequest) returns (HelloResponse) {} | ||
|
||
// Sends a batch of telemetry data. | ||
rpc Export(stream ExportRequest) returns (stream ExportResponse) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with stream which I did not think about is setting a deadline per message. We do want to have a deadline probably for every request correct? I am not sure if this is important or not but please also think about.
Also if we do go with unary RPCs we can remove the ack id because then it is very easy to map request with the response, also the piepline will be implemented using async unary rpc.
I am leaning more towards unary rpcs because you also removed the optimization to not send the node/resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with stream which I did not think about is setting a deadline per message. We do want to have a deadline probably for every request correct? I am not sure if this is important or not but please also think about.
Yes, client implementations definitely will need some sort of timeout after which they should consider the message to be lost if they don't hear from the server. At that point they have a choice: declare the message dropped (and increment some internal "dropped" counter) or retry sending the message. I think we should go with "dropped" approach, because retrying is risky, we don't know why the server is unable to respond to this message and can end up sending the same message over and over again and potentially overload the server.
I will add this to the spec.
Also if we do go with unary RPCs we can remove the ack id because then it is very easy to map request with the response, also the piepline will be implemented using async unary rpc.
See my other response regarding unary performance and why I think we should use streaming instead.
I am leaning more towards unary rpcs because you also removed the optimization to not send the node/resource.
I did not remove the node/resource optimization. I changes the Export message format to allow to include multiple nodes.
This is OpenCensus export format:
message ExportTraceServiceRequest {
Node node = 1;
repeated Span spans = 2;
Resource resource = 3;
}
This allows one Node, one Resource and multiple Spans.
I suggest to change this to following:
message ExportRequest {
uint64 id = 1;
repeated NodeSpans nodeSpans = 2;
repeated NodeMetrics nodeMetrics = 3;
}
message NodeSpans {
Node node = 1;
Resource resource = 2;
repeated Span spans = 3;
}
NodeSpans
looks like ExportTraceServiceRequest
in OpenCensus. In OLTP you can have many of these in one ExportRequest
(via repeated NodeSpans nodeSpans = 2
field). This is important for Collector use-case.
Collector receives telemetry data from different sources. Different sources means different Node
infromation. One of the jobs of a Collector is to combine telemetry data into batches for more efficient processing. However because of how OpenCensus ExportTraceServiceRequest
schema is defined it is impossible to batch telemetry data from different sources (different Nodes) into one export request. This can have dramatic performance impact on the efficiency because it can increase the number of Export requests that Collector must do by orders of magnitude.
We observed exactly this problem with OpenCensus protocol just last week.
My proposal fixes this problem, one ExportRequest
can contain telemetry data from many Nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimization that you removed is that Node/Resource can be sent once per stream. You did keep the optimization to send Node/Resource once for multiple data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can still have the same optimization per stream. It will be a bit more difficult for intermediary nodes like otelsvc to handle this optimization but I think it is doable.
For data recieved from telemetry sources (applications) if Resource is not specified in ExportRequest then the last Resource is applied ("last" means anything that is listed in the same ExportRequest in earlier batch or in the previous ExportRequest). This will allow telemetry sources which always use the same Resource to specify a single SpanBatch in ExportRequest and specify Resource only once, in the first request. Typically ExportRequest will contain only one element in spanBatches array.
otelsvc needs to be a bit more complex to handle this optimization. otelsvc needs to combine SpanBatch-es received from different streams into one Export request (and create bigger batches). In that case ExportRequest will contain an array of SpanBatch-es. If the SpanBatch received from a particular stream does not have a Resource specified than otelsvc will add the Resource field before combining it into multi-batch request. To be able to do so otelsvc receiver must cache the last received Resource per each stream. The otelsvc batcher will also be able to do batch compacting, it can eliminate unnecessary repetitions of the same Resource inside multiple batches received from the same source when creating a multi-batch request, like this (R is Resource, S is Span, rectangular block composed from R and S is one ExportRequest):
text/0000-opentelemetry-protocol.md
Outdated
|
||
2. For minor changes to the protocol future versions and extension of OTLP are encouraged to use the ability of Protocol Buffers to evolve message schema in backwards compatible manner. Newer versions of OTLP may add new fields to messages that will be ignored by clients and servers that do not understand these fields. In many cases careful design of such schema changes and correct choice of default values for new fields is enough to ensure interoperability of different versions without nodes explicitly detecting that their peer node has different capabilities. | ||
|
||
3. More significant changes must be explicitly defined as new optional capabilities. Such capabilities SHOULD be discovered by client and server implementations during `Hello` handshake. The capabilities SHOULD be described by the payload contained in `Hello` request and response. The mandatory capabilities defined by this specification are implied and SHOULD not be explicitly encoded in the `Hello` payload. A future implementation of OTLP that wants to use a certain optional capability SHOULD use `Hello` handshake to announce its capabilities and discover capabilities of the peer and adjust its behavior to match the expectation of a peer that does not support a particular capability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do it differently. If we do want to do a major change we can simply have an ExportV2 method on the rpc service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about using version numbers in the protocol (what you suggest with ExportV2 is having version numbers, per service call or per protocol). I decided against it for the following reason:
In the future we will likely decide to extend OTLP with various new capabilities. Some of this capabilities will probably be optional, not required for all implementations. So, imagine that we have N optional capabilities and clients and server can implement a subset of this N capabilities. It will potentially we have 2^N versions of the protocol that can exist in the wild which need to be able to talk with each other. It would not be scalable to attempt to label these protocols by versions numbers.
This is the reason I decided to go with the concept of "capabilities" instead of version numbers. When a new capability is added to the protocol, it will be the responsibility of client and server to exchange the capabilities they support during Hello
handshake.
|
||
Experimental implementation of OTLP over WebSockets exists and was researched as an alternate. WebSockets were not chosen as the primary transport for OTLP due to lack or immaturity of certain capabilities (such as [lack of universal support](https://github.com/gorilla/websocket#gorilla-websocket-compared-with-other-packages) for [RFC 7692](https://tools.ietf.org/html/rfc7692) message compression extension). Despite limitations the experimental implementation demonstrated good performance and WebSocket transport will be considered for inclusion in a future OTLP Extensions RFC. | ||
|
||
## Open Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support config pushes to the client via the same protocol? Probably not
and we should define a different protocol for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should not be in this RFC. Even if in the future we decide to support config pushes via the same gRPC stream it can be defined in in a future RFC as an extension to this spec.
* Four metrics RFCs to eliminate raw statistics * Create 4 RFC drafts on metrics * Update 0001-metric-pre-defined-labels.md Co-Authored-By: Isobel Redelmeier <[email protected]> * Update 0001-metric-pre-defined-labels.md Co-Authored-By: Isobel Redelmeier <[email protected]> * Refinements * Set status to proposed * Assign numbers 3-4-5-6 * Renumber refs * Update status styling * Fix misspellings * Combine the first three into one
c47c09f
to
f05c630
Compare
f05c630
to
58e2edd
Compare
* Document global initializion * Global initializer requirements * Minor revision * Set status * Rename 0010-global-init.md to text/0005-global-init.md * OTr->OTel
|
||
### Client Modes | ||
|
||
OTLP defines 2 modes of operation for clients: Synchronous and Pipelined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe parallel instead of pipelined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use pipelined as analogy from http pipelining mode: https://en.wikipedia.org/wiki/HTTP_pipelining
I think it describes the mode well. Parallel can be also used but I think analogy with http is useful.
text/0000-opentelemetry-protocol.md
Outdated
|
||
![Synchronous](images/otlp-synchronous.png) | ||
|
||
Synchronous mode is recommended when simplicity of implementation is desirable and when the client and the server are connected via very low-latency network, such as for example when the client is an instrumented application and the server is a OpenTelemetry Service running as as a local daemon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/OpenTelemetry Service running/OpenTelemetry Service instance is running
Remove double "as"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
The Server replies to every `Export` request received from the client by a corresponding `Export` response. The ID field of the response is set equal to the ID field of the request. | ||
|
||
If the Client does not receive a response to a particular request after a configurable period of time then the client SHOULD drop the request. The client SHOULD maintain a counter of such dropped data. This is the recommended approach because the reason that the response is not received may be that the server fails to generate a response to this particular request. Attempting to re-try sending may cause more problems at the server side. This timeout SHOULD be configured to be significantly larger than it normally takes the server to respond. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing Deadline on top of a bidirectional stream may be a hard and error prone thing. I need to think a bit more about this.
rpc Hello(HelloRequest) returns (HelloResponse) {} | ||
|
||
// Sends a batch of telemetry data. | ||
rpc Export(stream ExportRequest) returns (stream ExportResponse) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimization that you removed is that Node/Resource can be sent once per stream. You did keep the optimization to send Node/Resource once for multiple data
|
||
// Telemetry data. | ||
repeated NodeSpans nodeSpans = 2; | ||
repeated NodeMetrics nodeMetrics = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually metrics have different requirements in terms of retries/load-balancing/reporting periods etc. Having them in the same protocol forces us to apply the same configuration, or at least is less intuitive. Also even in the Receivers we would probably have different configurations for metrics/spans.
Also if any of the OSS backends like Jaeger/Zipkin decide to support this the will support half of the "Service". Having them in different grpc Services makes it more clear that Jaeger supports TraceService (so it supports the entire service).
text/0000-opentelemetry-protocol.md
Outdated
Benchmarking of OTLP vs other telemetry protocols was done using [reference implementation in Go](https://github.com/tigrannajaryan/exp-otelproto). In this OTLP implementation telemetry data, particularly Span batches are represented using a slightly modified Protocol Buffers schema taken from OpenCensus Protocol. Primary differences from OpenCensus schema are: | ||
|
||
- Eliminated TruncatableString and replaced by plain string. | ||
- Eliminated AttributesMap structure and used a map directly instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to use a list of pairs not a map. List is faster to be decoded because no map unique key requirements is checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this possibility. I agree lists are faster to decode. The problem I see with this is that it will make the in-memory representation of Attributes not very suitable for processing in otelsvc. Processors which need to inspect or modify specific attributes will have to do linear search and linear list modification instead of map lookup by key and it will make processing more expensive. Or otelsvc will need to transform the list into map internally to make processing easier and then transform it back to list before exporting. This will complicate otelsvc implementation.
I am not sure gain in the decoding speed is worth the complexity it brings to otelsvc. I can do some benchmarking to see how much we gain there is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did benchmarking of Attributes as a map vs list. List is 3 times faster (in this single benchmark):
BenchmarkEncode/OpenCensus-8 10 616829553 ns/op
BenchmarkEncode/OTLP/AttrMap-8 10 529560209 ns/op
BenchmarkEncode/OTLP/AttrList-8 50 144381428 ns/op
BenchmarkDecode/OpenCensus-8 10 640141705 ns/op
BenchmarkDecode/OTLP/AttrMap-8 20 538879696 ns/op
BenchmarkDecode/OTLP/AttrList-8 30 222093182 ns/op
I wasn't expecting so much gains from using list. Definitely worth considering. But we need a good solution for otelsvc (perhaps lazy-initializing the map from the list when needed).
|
||
If the client detects that the underlying transport is broken and must be re-established the client will optionally retry sending all requests that are still pending acknowledgement after re-establishing the transport. The client implementation SHOULD expose an option to turn on and off the retrying; by default retrying SHOULD be on. | ||
|
||
If the client is shutting down (e.g. when the containing process wants to exit) the client will optionally wait until all pending acknowledgements are received or until an implementation specific timeout expires. This ensures reliable delivery of telemetry data. The client implementation SHOULD expose an option to turn on and off the waiting during shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client is shutting down
This should also apply to the Synchronous case (it is a special case with a single pending ack).
The client implementation SHOULD expose an option to turn on and off the waiting during shutdown.
Since this is prescriptive to implementers I think it is good to say when the option can be changed.
|
||
In Pipelined mode the client SHOULD continue processing responses that the server may send to a stream that is closed from the client side until that stream is closed from the server side as well. This means that during this transition there may exist more than one stream of data between the client and the server. | ||
|
||
Similar stream re-opening MAY also be initiated from the server-side, however it is usually less preferable since it results in the loss of requests which are not yet acknowledged and requires the client to re-send the requests. Even if the client is well-behaved and does the re-sending correctly this still results in overall worse throughput of the protocol. Implementations where the server needs to serve a mix of clients some of which are utilizing stream re-opening strategy and some others don't are recommended to be configured `RebalanceTimeout` and `RebalanceRequests` values that are larger at server-side than the corresponding values at client-side. This way the protocol will fall back to stream re-opening at the server-side only for the clients, which don't already utilize client-side stream re-opening. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is implicit that this is not the default gRPC server side keepalive settings, perhaps it is worth mentioning that and their interaction with keepalive settings (they have defaults even if not specified). A simplifying assumption to implement these on the server side is that it receives each individual requests, respond to the request, and then checks for RebalanceTimeout
and RebalanceRequests
thresholds.
* Propose sampling API changes * Add details about SamplingHint, Sampler interface and default samplers. * Add details about start span operation. * Update spelling text/0006-sampling.md Co-Authored-By: Yang Song <[email protected]> * Update spelling text/0006-sampling.md Co-Authored-By: Yang Song <[email protected]> * Add span name to the Sampler interface. Clarify how the sampling flags are exposed. * Use ascii for personas * Add spankind to the sampler input. * Remove delayed span creation proposal from this RFC * Remove unspecified type from sampler hint. * Update text/0006-sampling.md Co-Authored-By: Tristan Sloughter <[email protected]> * Add clarification that Sampler is always called.
- Add request timeout behavior. - Change throttling to be based on gRPC error code. - Change stream re-opening to be an implementation recommendation instead of protocol requirement. - Add optional description for server-side stream re-opening.
f2f8338
to
cf5d613
Compare
I opened a PR against oteps repo (open-telemetry#35), let's continue discussion there. Closing this one. |
No description provided.