Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14958: [C++][Python][FlightRPC] Implement Flight middleware for OpenTelemetry propagation #11920

Merged
merged 1 commit into from
Sep 26, 2022

Conversation

lidavidm
Copy link
Member

@lidavidm lidavidm commented Dec 9, 2021

Adds a client middleware that sends span/trace ID to the server, and a server middleware that gets the span/trace ID and starts a child span.

The middleware are available in builds without OpenTelemetry, they simply do nothing.

@github-actions
Copy link

github-actions bot commented Dec 9, 2021

@lidavidm
Copy link
Member Author

lidavidm commented Dec 9, 2021

Example output, from the unit test:

(The unit test will not normally print this, I just modified it to double-check the results.)

{
  name          : DoAction
  trace_id      : a9d2a9ecd0aec3bb44bd10ad940a31b7
  span_id       : 7b025703fe739005
  tracestate    : 
  parent_span_id: 0000000000000000
  start         : 1639076215603966576
  duration      : 739917
  description   : 
  span kind     : Server
  status        : Ok
  attributes    : 
	thread_id: 140075521935104
	rpc.grpc.status_code: 0
	rpc.method: DoAction
	rpc.service: arrow.flight.protocol.FlightService
	rpc.system: grpc
  events        : 
  links         : 
  resources     : 
	service.name: unknown_service
	telemetry.sdk.version: 1.1.0
	telemetry.sdk.name: opentelemetry
	telemetry.sdk.language: cpp
  instr-lib     : arrow
}
{
  name          : DoAction
  trace_id      : a605cd6ac77d28ea4fc94062bb0ba455
  span_id       : f599a779803fdf67
  tracestate    : 
  parent_span_id: a9d7e9435894a240
  start         : 1639076215610106244
  duration      : 236620
  description   : 
  span kind     : Server
  status        : Ok
  attributes    : 
	thread_id: 140075513542400
	rpc.grpc.status_code: 0
	rpc.method: DoAction
	rpc.service: arrow.flight.protocol.FlightService
	rpc.system: grpc
  events        : 
  links         : 
  resources     : 
	service.name: unknown_service
	telemetry.sdk.version: 1.1.0
	telemetry.sdk.name: opentelemetry
	telemetry.sdk.language: cpp
  instr-lib     : arrow
}
{
  name          : test
  trace_id      : a605cd6ac77d28ea4fc94062bb0ba455
  span_id       : a9d7e9435894a240
  tracestate    : 
  parent_span_id: 0000000000000000
  start         : 1639076215607131441
  duration      : 3385487
  description   : 
  span kind     : Internal
  status        : Unset
  attributes    : 
	thread_id: 140075633232128
  events        : 
  links         : 
  resources     : 
	service.name: unknown_service
	telemetry.sdk.version: 1.1.0
	telemetry.sdk.name: opentelemetry
	telemetry.sdk.language: cpp
  instr-lib     : arrow
}

@lidavidm
Copy link
Member Author

lidavidm commented Dec 9, 2021

CC @cpcloud, this is "automatic" instrumentation for Flight/OpenTelemetry. AFAIK, this isn't generally possible in gRPC/C++. Server interceptors have no way to pass data from the interceptor to the RPC handler. (OpenCensus support is achieved by hardcoding it into the library.) Also, IIRC thread locals (i.e. the OTel Context) are not viable because the library makes no guarantee about whether RPC handlers are run on the same thread as interceptors or not.

Flight works around this because the server interceptors don't use the gRPC interceptor framework; instead, the Flight RPC handlers hardcode calls to the interceptors before handing control to the Flight application. Hence, a Span started in a Flight interceptor will be active during the application's RPC handler.

The OpenTelemetry/gRPC example just hardcodes a call to OpenTelemetry within the RPC handler and does not try to implement more general instrumentation.

lidavidm added a commit that referenced this pull request Feb 28, 2022
Quickly bump the version since it changes a few APIs we'll use (most notably for #11920).

#11963 will also need updating, but the conda-forge packages need to be updated first.

This does not include the fix needed for #12408, that will require another version bump.

Closes #12516 from lidavidm/arrow-15789

Authored-by: David Li <[email protected]>
Signed-off-by: David Li <[email protected]>
@lidavidm lidavidm changed the title ARROW-14958: [C++] Implement Flight middleware for OpenTelemetry propagation ARROW-14958: [C++][Python] Implement Flight middleware for OpenTelemetry propagation Feb 28, 2022
@lidavidm lidavidm changed the title ARROW-14958: [C++][Python] Implement Flight middleware for OpenTelemetry propagation ARROW-14958: [C++][Python][FlightRPC] Implement Flight middleware for OpenTelemetry propagation Mar 8, 2022
@pitrou
Copy link
Member

pitrou commented May 4, 2022

@lidavidm Does this PR need reviving or should it be closed?

@lidavidm
Copy link
Member Author

lidavidm commented May 4, 2022

I'll clean this up at some point. It mostly needs a suitable reviewer.

@lidavidm lidavidm marked this pull request as draft May 4, 2022 16:30
@lidavidm lidavidm force-pushed the arrow-14958 branch 3 times, most recently from 16818dc to 38eec85 Compare May 24, 2022 17:27
@lidavidm lidavidm marked this pull request as ready for review May 24, 2022 21:03
@pitrou
Copy link
Member

pitrou commented May 30, 2022

It seems this needs rebasing and fixing conflicts :-(

@lidavidm lidavidm force-pushed the arrow-14958 branch 2 times, most recently from a7c8757 to dd6a25a Compare May 31, 2022 16:35
auto context = otel::context::RuntimeContext::GetCurrent();
auto propagator =
otel::context::propagation::GlobalTextMapPropagator::GetGlobalPropagator();
propagator->Inject(carrier, context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this make a copy of carrier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


void SendingHeaders(AddCallHeaders* outgoing_headers) override {
for (const auto& pair : carrier_.context_) {
outgoing_headers->AddHeader(pair.first, pair.second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... if OTel is adding arbitrary key-value pairs, should these really be propagated as HTTP headers, or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's designed to be transported in headers, it's just that the API doesn't tell you what they are. They're well defined in the spec (formats listed here: https://github.com/open-telemetry/opentelemetry-specification/blob/e2c2472985b17e37a25a7dc5aa0aa071e6683c98/specification/context/api-propagators.md#propagators-distribution)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can you add a comment about that?

@@ -61,6 +60,10 @@ enum class FlightMethod : char {
DoExchange = 9,
};

/// \brief Get a human-readable name for a Flight method.
ARROW_FLIGHT_EXPORT
std::string FlightMethodToString(FlightMethod method);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use the ToString convention in other places, so perhaps:

Suggested change
std::string FlightMethodToString(FlightMethod method);
std::string ToString(FlightMethod method);

Comment on lines 56 to 58
TraceKey() = default;
TraceKey(std::string key, std::string value)
: key(std::move(key)), value(std::move(value)) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ should synthesize these constructors automatically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this won't work until C++20 (https://stackoverflow.com/a/61205386/262727)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but you could try items_->push_back({std::string(key), std::string(value)})

} else {
span_->SetStatus(otel::trace::StatusCode::kOk, "");
span_->SetAttribute(OTEL_GET_TRACE_ATTR(AttrRpcGrpcStatusCode), int32_t(0));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newbie question, but will this automatically end the span?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll end when the Span goes out of scope, but I'll just manually end it here to be explicit.

Comment on lines 56 to 58
TraceKey() = default;
TraceKey(std::string key, std::string value)
: key(std::move(key)), value(std::move(value)) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but you could try items_->push_back({std::string(key), std::string(value)})

Comment on lines 170 to 171
Result() = default;
explicit Result(std::shared_ptr<Buffer> body) : body(std::move(body)) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither this should be necessary?


void SendingHeaders(AddCallHeaders* outgoing_headers) override {
for (const auto& pair : carrier_.context_) {
outgoing_headers->AddHeader(pair.first, pair.second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can you add a comment about that?

reinterpret_cast<TracingServerMiddleware*>(call_context.GetMiddleware("tracing"));
if (!middleware) return Status::Invalid("Could not find middleware");
#ifdef ARROW_WITH_OPENTELEMETRY
EXPECT_GT(middleware->GetTraceContext().size(), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment a bit on what this does/tests?

opentelemetry::context::propagation::GlobalTextMapPropagator::SetGlobalPropagator(
opentelemetry::nostd::shared_ptr<
opentelemetry::context::propagation::TextMapPropagator>(
new opentelemetry::trace::propagation::HttpTraceContext()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the kind of setup that third party server code would have to write as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the application would do this as part of initializing OpenTelemetry, though generally the SDKs provide conveniences to configure this.

def do_action(self, context, action):
trace_context = context.get_middleware("tracing").trace_context
# Don't turn this method into a generator since then it'll be
# lazily evaluated, and the trace context will be lost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If trace_context is a regular Python dict, how would it be lost?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluating .trace_context is side-effectful and depends on implicit state maintained by OpenTelemetry, so if this is a generator it'll be evaluated after OpenTelemetry has already cleaned up the state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment to be clearer about what goes on.

@lidavidm
Copy link
Member Author

Rebased, will merge assuming no further comments as this has been sitting around for a long time

@lidavidm lidavidm force-pushed the arrow-14958 branch 2 times, most recently from 8c211f3 to 6a8660a Compare September 25, 2022 01:01
@lidavidm lidavidm merged commit be30611 into apache:master Sep 26, 2022
@lidavidm lidavidm deleted the arrow-14958 branch September 26, 2022 11:42
@ursabot
Copy link

ursabot commented Sep 26, 2022

Benchmark runs are scheduled for baseline = f941118 and contender = be30611. be30611 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.2% ⬆️0.0%] test-mac-arm
[Failed ⬇️1.37% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.14% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] be306118 ec2-t3-xlarge-us-east-2
[Finished] be306118 test-mac-arm
[Failed] be306118 ursa-i9-9960x
[Finished] be306118 ursa-thinkcentre-m75q
[Finished] f941118e ec2-t3-xlarge-us-east-2
[Failed] f941118e test-mac-arm
[Failed] f941118e ursa-i9-9960x
[Finished] f941118e ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

zagto pushed a commit to zagto/arrow that referenced this pull request Oct 7, 2022
… OpenTelemetry propagation (apache#11920)

Adds a client middleware that sends span/trace ID to the server, and a server middleware that gets the span/trace ID and starts a child span.

The middleware are available in builds without OpenTelemetry, they simply do nothing.

Authored-by: David Li <[email protected]>
Signed-off-by: David Li <[email protected]>
fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Oct 17, 2022
… OpenTelemetry propagation (apache#11920)

Adds a client middleware that sends span/trace ID to the server, and a server middleware that gets the span/trace ID and starts a child span.

The middleware are available in builds without OpenTelemetry, they simply do nothing.

Authored-by: David Li <[email protected]>
Signed-off-by: David Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants