Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Jaeger exporter #534

Merged
merged 41 commits into from
May 11, 2021
Merged

Add Jaeger exporter #534

merged 41 commits into from
May 11, 2021

Conversation

ThomsonTan
Copy link
Contributor

@codecov
Copy link

codecov bot commented Jan 26, 2021

Codecov Report

Merging #534 (e38dec6) into main (e901a36) will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #534      +/-   ##
==========================================
+ Coverage   94.75%   94.76%   +0.01%     
==========================================
  Files         217      217              
  Lines        9923     9923              
==========================================
+ Hits         9403     9404       +1     
+ Misses        520      519       -1     
Impacted Files Coverage Δ
...entelemetry/exporters/memory/in_memory_span_data.h 100.00% <ø> (ø)
sdk/src/logs/batch_log_processor.cc 95.00% <0.00%> (+1.25%) ⬆️

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need help figuring out a bazel build? If so, ping me on gitter, happy to help get this out the door!


private:
std::unique_ptr<AgentClient> agent_;
std::shared_ptr<TTransport> socket_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these shared_ptr?

There's ref-counting overhead vs. unique_ptr. Are these actually shared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Josh. Yes, I also prefer unique_ptr if possible. For the cases of using shared_ptr, that would be needed as it is required to be passed to other functions as shared_ptr, so I just created it as shared_ptr in the first place.

exporters/jaeger/src/udp_transport.cc Outdated Show resolved Hide resolved
@ThomsonTan ThomsonTan changed the title Add Jaeger exporter [WIP] Add Jaeger exporter Feb 1, 2021
@seemk
Copy link
Contributor

seemk commented Apr 7, 2021

Are there plans to continue with this draft PR? We might need this for nginx and httpd instrumentation, would gladly help.

@ThomsonTan
Copy link
Contributor Author

Are there plans to continue with this draft PR? We might need this for nginx and httpd instrumentation, would gladly help.

@seemk thanks. I'll clean and push my remaining changes this week. Your help will be appreciated.

@ThomsonTan ThomsonTan marked this pull request as ready for review April 9, 2021 19:01
@ThomsonTan ThomsonTan requested a review from a team April 9, 2021 19:01
@ThomsonTan ThomsonTan changed the title [WIP] Add Jaeger exporter Add Jaeger exporter Apr 9, 2021
Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding Jaeger support :)

exporters/jaeger/src/recordable.cc Outdated Show resolved Hide resolved
core::SystemTimestamp timestamp,
const common::KeyValueIterable &attributes) noexcept
{
// TODO: convert event to Jaeger Log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have tickets for the TODOs if they are not planned in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll log issues for TODOs when this PR is ready to merge.

{
if (code == trace::StatusCode::kUnset)
{
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't return here, as the description should still be needed to set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion doc says the status (Jaeger span status?) should not be set for UNSET, let me know if this is still needed.

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk_exporters/jaeger.md#status

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks fine in that case.


using namespace jaegertracing;

class Recordable final : public sdk::trace::Recordable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not something which needs immediate attention, but probably we should have recordable names after transport types - eg TUDPRecordable. This is an issue with Zipkin exporter too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I logged the same comment for ETW exporter. I think we could create a task for the cleanup for all the exporters? I'll try the renaming anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue #735 for this.

exporters/jaeger/src/jaeger_exporter.cc Show resolved Hide resolved
exporters/jaeger/CMakeLists.txt Outdated Show resolved Hide resolved
struct JaegerExporterOptions
{
// The endpoint to export to.
std::string server_addr = "localhost";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps future proofing, but ff Thrift over HTTP is supported, perhaps rename it to endpoint and parse host, port from it? With HTTP it will be necessary to specify the full endpoint, e.g. localhost:1234/api/traces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suggest we refactor the option in later PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

const uint32_t span_size = CalcSizeOfSerializedThrift(jaeger_span);
if (span_size > max_span_bytes)
{
// TODO, log too large span error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice to have, but how does otel-cpp log in general? I.e. I don't think it should write to stdout or stderr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use otel-cpp logger, and let the application decide where to export it. There is ticket #408
to handle error logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's unify the SDK internal logging once we have it.

{
kThriftUdp,
kThriftUdpCompact,
kThriftHttp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps only the currently supported transports should be listed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that works too. But as this list is expected to be stable, I just leave it there to make the header file stable (later extension should not need to update the header file, just update the .cc file). But let me know if removing the unimplemented formats is preferred.

Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sorry for keeping it waiting for long. I think we can move it to main-stream faster, and then evolve for any changes.


uint32_t TUDPTransport::read(uint8_t *buf, uint32_t len)
{
uint32_t num_read = recvfrom(socket_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this call block the exporter (and so the application) till any response is received from the server/agent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a block call but it is not supposed to be called in the export process (only sendto is called). It is here because it is required to implement the full thrift Jaeger class.

{
if (code == trace::StatusCode::kUnset)
{
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks fine in that case.

const uint32_t span_size = CalcSizeOfSerializedThrift(jaeger_span);
if (span_size > max_span_bytes)
{
// TODO, log too large span error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use otel-cpp logger, and let the application decide where to export it. There is ticket #408
to handle error logging.

return 0;
}

std::lock_guard<std::mutex> guard{lock_};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to lock here? Also, can we use common::SpinLockMutex ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the mutex guarded operation is a little heavy (network IO is involved), I am not preferring using SpingLockMutex to perform some busy wait here. Let me know your thoughts on switching to common::SpinLockMutext.

exporters/jaeger/src/thrift_sender.cc Outdated Show resolved Hide resolved
return FlushWithLock();
}

int ThriftSender::FlushWithLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does FlushWithLock() and Append() run on separate threads ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Append() could be called from a different thread than the one calling FlushWithLock().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the spec (see below excerpt), I removed the lock on the thrift_sender which belong to the exporter instance.

Export() will never be called concurrently for the same exporter instance. Export() can be called again only after the current call returns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

span_buffer_.push_back(jaeger_span);
if (byte_buffer_size_ < max_span_bytes)
{
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean if there are very few spans created, they may keep waiting for a long time in the queue. Should there be some timeout, or we somehow ensure that every Export() results in the export over the network?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The few events could be there till the tracer object is destructed. We probably could create a separate thread to do flush periodically. But I haven't seen other implementations have the timeout mechanism. How about filing a ticket to follow up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could create a separate thread to do flush periodically.

This doesn't sound correct to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the spec and OpenTelemetry-DotNet link. The spec clearly requires each batch of spans to be serialized and sent to destination. Add a call to Flush() at the end of Export(batch) for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems a good approach.

@lalitb
Copy link
Member

lalitb commented May 5, 2021

Looks good. @jsuereth has requested a change. We can pull this in once he re-review it.

@lalitb lalitb requested a review from jsuereth May 6, 2021 19:30
Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good for the working first copy.

Agree with most comments to open follow-on bugs. Do you need help getting bazel working here? As for installation instructions for bazel, it'll likely be "Just add this dependency in your bazel build". I'm a bit swamped with stuff right now, but I'm happy to help with the time I have on bazel-related build issue if needed.

)

# TODO: enable bazel build
# cc_binary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: anything I can do to help get this working?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jsuereth I will send out another PR with bazel build script then will need your help to make sure it is installing thrift dependence in the right way.


using namespace jaegertracing;

class Recordable final : public sdk::trace::Recordable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

struct JaegerExporterOptions
{
// The endpoint to export to.
std::string server_addr = "localhost";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lalitb
Copy link
Member

lalitb commented May 10, 2021

@ThomsonTan - Seems this is good to merge now. As I understand, you and/or @jsuereth will take care of bazel build as separate PR.

@ThomsonTan ThomsonTan added the pr:please-merge This PR is ready to be merged by a Maintainer (rebased, CI passed, has enough valid approvals, etc.) label May 10, 2021
@lalitb lalitb merged commit 5914160 into open-telemetry:main May 11, 2021
@lalitb lalitb mentioned this pull request May 11, 2021
@ThomsonTan ThomsonTan deleted the JaegerExporter branch November 9, 2022 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:exporter pr:please-merge This PR is ready to be merged by a Maintainer (rebased, CI passed, has enough valid approvals, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants