Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds minimal traceparent header support to Elasticsearch #74210

Merged
merged 15 commits into from
Jun 21, 2021

Conversation

Mpdreamz
Copy link
Member

@Mpdreamz Mpdreamz commented Jun 16, 2021

This adds just enough support for the traceparent header to be useful in master.

Since Elasticsearch already logs in ECS format extending it with support for transaction.id and trace.id is a quick win.

This allows us to surface server/deprecation slow logs from an instrumented application using the Trace Logs feature.

image

This forwards to Logs UI with a filter on trace.id

image

The parsing of the header itself is naive. I am hoping Elasticsearch in the future can take a dependency on the Java Agent OOTB (https://github.com/elastic/apm-agent-java) which ships with high performance parsing and validation routines. Which could then act as a jumping off point to manually instrument Elasticsearch.

We can work on incrementally adding value 🥳

NOTE: this needs #74211 to prevent clashes with Filebeat/ECS defaults

Comment on lines 354 to 355
String[] tokens = traceparent.split("-");
if (tokens.length == 4) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a suggestion: you could validate the length of the header. If it has the expected minimum length, use java.lang.String#substring(int, int) with the known offsets for the trace id and parent id.

@@ -570,7 +570,7 @@ protected Node(final Environment initialEnvironment,
final Transport transport = networkModule.getTransportSupplier().get();
Set<String> taskHeaders = Stream.concat(
pluginsService.filterPlugins(ActionPlugin.class).stream().flatMap(p -> p.getTaskHeaders().stream()),
Stream.of(Task.X_OPAQUE_ID)
Stream.of(Task.X_OPAQUE_ID, Task.TRACE_PARENT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sure the header gets propagated to other downstream ES nodes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik this makes sure the headers go over the transport wire to other nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it would technically render the traceparent header invalid, we may want to zero out the parent-id part of the header when sending to downstream nodes.

Kibana 
  |
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
  |
  v
ES Node1
  |
traceparent: 00-0af7651916cd43dd8448eb211c80319c-0000000000000000-01
  |
  v
ES Node 2

That is because ES is not creating spans currently and thus there's no parent.
Based on the outcome of this discussion (#74210 (comment)), we don't need that part of the traceparent header anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed transaction.id.

I personally prefer not zeroing out and treat ES as pass-through for now given this piece of information is no longer used.

Mpdreamz added 5 commits June 18, 2021 09:14
This makes Elasticsearch accept `traceparent` header and makes sure its
available as variable in log4j.

By default ECS logger includes it.
@matriv matriv added the :Core/Infra/Logging Log management and logging utilities label Jun 18, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jun 18, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@Mpdreamz Mpdreamz force-pushed the feature/trace-header branch from 99c90de to e3bb235 Compare June 18, 2021 07:49
Copy link
Contributor

@pgomulka pgomulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!
can we also add testing in JsonLoggerTests ?
I am not sure what is the difference for trace_parent and trace_id ?

@@ -344,6 +345,17 @@ private void tryAllHandlers(final RestRequest request, final RestChannel channel
BytesRestResponse.
createSimpleErrorResponse(channel, BAD_REQUEST, "multiple values for single-valued header [" + name + "]."));
return;
} else if (name.equals(Task.TRACE_PARENT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could extract the whole headers copy to a separate method?

ThreadContextStruct threadContextStruct =
DEFAULT_CONTEXT.putHeaders(Map.of(Task.X_OPAQUE_ID, context.requestHeaders.get(Task.X_OPAQUE_ID)));

if (context.requestHeaders.containsKey(Task.X_OPAQUE_ID) || context.requestHeaders.containsKey(Task.TRACE_PARENT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need for TRACE_ID ?

Copy link
Member

@felixbarny felixbarny Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so as TRACE_ID a transient header. It's not getting in our out of a single node. It's just used for log correlation.

ThreadContextStruct threadContextStruct =
DEFAULT_CONTEXT.putHeaders(Map.of(Task.X_OPAQUE_ID, context.requestHeaders.get(Task.X_OPAQUE_ID)));

if (context.requestHeaders.containsKey(Task.X_OPAQUE_ID) || context.requestHeaders.containsKey(Task.TRACE_PARENT)) {
Copy link
Member

@felixbarny felixbarny Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so as TRACE_ID a transient header. It's not getting in our out of a single node. It's just used for log correlation.

@Mpdreamz
Copy link
Member Author

Mpdreamz commented Jun 18, 2021

looks good!
can we also add testing in JsonLoggerTests ?

Will do! 😸

I am not sure what is the difference for trace_parent and trace_id ?

trace_parent refers to the whole header, trace_id refers to the parsed trace_id portion inside the header.
See the specification for naming: https://www.w3.org/TR/trace-context/#traceparent-header

@Mpdreamz Mpdreamz requested a review from pgomulka June 18, 2021 13:52
@felixbarny
Copy link
Member

@pgomulka will the trace.id be added to the ECS JSON logs only or also to plain-text logs?
IIRC, plain-text is the default in 7.x and ECS JSON is the default in master/8.0, right?

@pgomulka
Copy link
Contributor

pgomulka commented Jun 21, 2021

@pgomulka will the trace.id be added to the ECS JSON logs only or also to plain-text logs?
IIRC, plain-text is the default in 7.x and ECS JSON is the default in master/8.0, right?

@felixbarny
Both ES8 and ES7 emit plaintext logs. ES7 also emits JSON logs, but not in ECS schema
ES8 emits JSON logs in ECS schema

the sample log line for ES8 will be

{
  "@timestamp": "2021-06-21T11:44:05.345Z",
  "log.level": "INFO",
  "message": "[my_index4] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]",
  "service.name": "ES_ECS",
  "process.thread.name": "elasticsearch[runTask-0][masterService#updateTask][T#1]",
  "log.logger": "org.elasticsearch.cluster.metadata.MetadataCreateIndexService",
  "event.dataset": "elasticsearch.server",
  "trace.id": "0af7651916cd43dd8448eb211c80319c",
  "elasticsearch.cluster.uuid": "0DXKTcoxQVWTczJONPHTWA",
  "elasticsearch.node.id": "1NGejCPMROaZcMnpQysNIw",
  "elasticsearch.node.name": "runTask-0",
  "elasticsearch.cluster.name": "runTask"
}

and the corresponding plain text log line emitted at the same time

[2021-06-21T13:44:05,345][INFO ][o.e.c.m.MetadataCreateIndexService] [runTask-0] [my_index4] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]

This change is intended to be in v8, right?

@pugnascotia
Copy link
Contributor

@pgomulka I skimmed over the diffs, looked reasonable and nothing stood out.

@pgomulka pgomulka merged commit c412aae into elastic:master Jun 21, 2021
@felixbarny
Copy link
Member

This change is intended to be in v8, right?

IINM, @mshustov was hoping to get the trace.id in 7.x logs (at least slowlogs) so that there's end-to-end log correlation from Kibana to ES.
I suppose adding the trace.id to 7.x JSON logs wouldn't be too much of a hassle? Plain-text logs might be more tricky as it would also require changes in the log parsing pipelines.

@pgomulka in 8.x, will the default log format be ECS JSON for Elasticsearch? If so, I think it makes sense to do the same in Kibana. WDYT @mshustov?

@mshustov
Copy link

mshustov commented Jun 22, 2021

IINM, @mshustov was hoping to get the trace.id in 7.x logs (at least slowlogs) so that there's end-to-end log correlation from Kibana to ES.

yes. I started talking to the ES team. I will open an issue a bit later.

in 8.x, will the default log format be ECS JSON for Elasticsearch?

Kibana does support ECS JSON format in v8.0, but I've never heard it's going to be the default in Elasticseasrch as of the next major release. @pgomulka Is there an appropriate issue in the Elasticsearch repo? I didn't manage to find one.

Plain-text logs might be more tricky as it would also require changes in the log parsing pipelines.

@pgomulka Kibana and Elasticsearch must align on this behavior. Is the Elasticsearch team going to attach tracing information to the Plain-test logs?

@pgomulka
Copy link
Contributor

pgomulka commented Jun 22, 2021

the default will be ECS JSON format in v8.0. ES will emit .json log files (in ECS) for server, slow logs, deprecated logs. These log files contain additional 'meta data' fields like node ids, cluster names, x-opaque-id and now trace.id.

The purpose of the plaintext (.log) server logs is to be human readable and it only contains the basic fields (timestamp, class, thread, message). Note that only server logs will be in plaintext. All other logs are only in json.

here is an issue #46119
and a PR that makes the JSON logs in ECS and remove the plaintext slow logs/deprecation logs https://github.com/elastic/elasticsearch/pull/47105/files

you should expect these logs to be generated by ESv8

console - > plaintext
my_cluster.log plaintext
my_cluster_server.json (ECS)
my_cluster_audit.json
my_cluster_deprecation.json (ECS)
my_cluster_index_indexing_slowlog.json (ECS)
my_cluster_index_search_slowlog.json (ECS)

@albertzaharovits do you know if we plan to have audit logs in ECS? Do we want trace.id there as well?

pgomulka pushed a commit to pgomulka/elasticsearch that referenced this pull request Jul 14, 2021
This adds just enough support for the traceparent header to be useful in es8.
Since Elasticsearch already logs in ECS format extending it with support for transaction.id and trace.id is a quick win.
This allows us to surface server/deprecation slow logs from an instrumented application using the Trace Logs feature.
Parsing `traceparent` in http layer and populating tasks with `trace_id` which is preserved in thread context.
pgomulka added a commit that referenced this pull request Aug 3, 2021
) (#75331)

This adds just enough support for the traceparent header to be useful in es7
Contrary to es8 in 7.x branch the logs are in custom JSON format - not in ECS.
trace.id is added to deprecation, slow logs and server JSON logs

backport #74210
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Aug 4, 2021
A new trace.id header is added by elastic#74210. It is handled almost the same
way as x-opaque-id. Specifically, it gets passed into default thread
context. This means the existing assertion should expect it in addition
to x-opaque-id.
ywangd added a commit that referenced this pull request Aug 5, 2021
A new trace.id header is added by #74210. It is handled almost the same
way as x-opaque-id. Specifically, it gets passed into default thread
context. This means the existing assertion should expect it in addition
to x-opaque-id.
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Aug 5, 2021
A new trace.id header is added by elastic#74210. It is handled almost the same
way as x-opaque-id. Specifically, it gets passed into default thread
context. This means the existing assertion should expect it in addition
to x-opaque-id.
ywangd pushed a commit that referenced this pull request Aug 5, 2021
A new trace.id header is added by #74210. It is handled almost the same
way as x-opaque-id. Specifically, it gets passed into default thread
context. This means the existing assertion should expect it in addition
to x-opaque-id.
pgomulka added a commit that referenced this pull request Jan 25, 2022
since #74210 ES is emitting trace.id into its logs, but it did not emit it into audit logs.
This commit adds trace.id into audit logging.
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Jan 25, 2022
since elastic#74210 ES is emitting trace.id into its logs, but it did not emit it into audit logs.
This commit adds trace.id into audit logging.
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Jan 25, 2022
since elastic#74210 ES is emitting trace.id into its logs, but it did not emit it into audit logs.
This commit adds trace.id into audit logging.
pgomulka added a commit that referenced this pull request Jan 25, 2022
since #74210 ES is emitting trace.id into its logs, but it did not emit it into audit logs.
This commit adds trace.id into audit logging.
backport #82849
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Logging Log management and logging utilities >enhancement Team:Core/Infra Meta label for core/infra team v7.15.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants