Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic conventions for Elasticsearch client instrumentation #3358

Closed
wants to merge 13 commits into from

Conversation

estolfo
Copy link

@estolfo estolfo commented Apr 3, 2023

Fixes open-telemetry/semantic-conventions#706

Changes

Add semantic conventions for Elasticsearch client instrumentation span names and attributes.

Related OTEP(s)

@estolfo estolfo requested review from a team April 3, 2023 11:14
@arminru arminru added spec:trace Related to the specification/trace directory semconv:database area:semantic-conventions Related to semantic conventions labels Apr 4, 2023
Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the understanding of the upcoming changes #3358 (comment).

@trask
Copy link
Member

trask commented Apr 5, 2023

@estolfo can you post a comparison of this proposal with the existing Java and Python implementations? I think that would help us pull in and get feedback from Java and Python reviewers.

@estolfo
Copy link
Author

estolfo commented Apr 11, 2023

Python

This is what I see from looking at the source code and the tests:

Span name: Elasticsearch/ + (url || method || “request”)
For example: Elasticsearch/test-index/_doc/:id

  • doc id is replaced with :id
  • If the request is _search, the target is replaced with <target>

Special span attributes:

elasticsearch.id is the document id
elasticsearch.method is the http method
elasticsearch.url is the url (the path)
elasticsearch.target is the index or data stream
elasticsearch.params is query params

result fields:
elasticsearch.found
elasticsearch.timed_out
elasticsearch.took

Other fields

db_system is elasticsearch
db_statement is the request body

cc @trask

@estolfo
Copy link
Author

estolfo commented Apr 12, 2023

Java

It's not so clear to me how the Java instrumentation works. I think they create spans for both the high-level rest client and the transport layer. I think it'd be best to get the java group's input on whether the following is accurate.

The span name depends on the level of the instrumentation. From what I can tell, the rest client instrumentation creates a span with the http method as the name, for example GET. I think I see that the transport client instrumentation uses a span name corresponding to the "action". For example, GetAction.

Special Span attributes

elasticsearch.action
elasticsearch.request
elasticsearch.request.indices
elasticsearch.request.search.types
elasticsearch.type
elasticsearch.id
elasticsearch.version
elasticsearch.shard.broadcast.total
elasticsearch.shard.broadcast.successful
elasticsearch.shard.broadcast.failed
elasticsearch.shard.replication.total
elasticsearch.shard.replication.successful
elasticsearch.shard.replication.failed
elasticsearch.response.status
elasticsearch.shard.bulk.id
elasticsearch.shard.bulk.index
elasticsearch.node.failures
elasticsearch.node.cluster.name
elasticsearch.request.write.type
elasticsearch.request.write.routing
elasticsearch.request.write.version

Other fields

db_system is elasticsearch
db_operation is the http method
db_statement is http method + path, for example GET _cluster/health
http attributes
net attributes

cc @trask

@estolfo
Copy link
Author

estolfo commented Apr 12, 2023

.NET

I'm adding another one for good luck :)
Here is a summary of the .NET Elasticsearch client instrumentation, which is native to the client, in case it's helpful.

Span name: One of the URL template values listed here.

Other fields

db.system is elasticsearch
db.statement is request body

http attributes / net attributes

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx!

Comment on lines 12 to 16
The elasticsearch url is modified with placeholders in order to reduce the cardinality of the span name. When the url
contains a document id, it SHOULD be replaced by the identifier `{id}`. When the url contains a target data stream or
index, it SHOULD be replaced by `{target}`.
For example, a request to `/test-index/_doc/123` should have the span name `GET /{target}/_doc/{id}`.
When there is no target or document id, the span name will contain the exact url, as in `POST /_search`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when looking at a given elasticsearch url, what's the best way for instrumentation to detect which part of it should be replaced by {id} and which part should be replaced with {target}? is there a limited set elasticsearch url patterns?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a file in the Elasticsearch specification repo that can be used.
As an example, the .NET client uses this generator to create this path lookup file that is used in the instrumentation.

semantic_conventions/trace/database.yaml Outdated Show resolved Hide resolved
| Attribute | Type | Description | Examples | Requirement Level |
|----------------------------|---|----------------------------------------------------------------------|---------------------------------------------------------|------------------------|
| `db.elasticsearch.doc_id` | string | The document that the request targets, specified in the path. | `'123'` | Conditionally Required |
| `db.elasticsearch.target` | string | The name of the data stream or index that is targeted. | `'users'` | Conditionally Required |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this also extracted from the path?

Suggested change
| `db.elasticsearch.target` | string | The name of the data stream or index that is targeted. | `'users'` | Conditionally Required |
| `db.elasticsearch.target` | string | The name of the data stream or index that is targeted, specified in the path. | `'users'` | Conditionally Required |

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on the implementation of the client. It's possible that the client is written in way such that the target is specified as an argument or part of a hash passed to the function/method. The Ruby client, for example, has the target specified in the hash passed as an arg to the action method.
Example source / usage
So your suggested change makes sense; the target is determined not necessarily from the path, though it can appear as part of the path. I had this wording in there because I wanted to emphasize that the db.elasticsearch.target span attribute should match exactly what is in the path and what is replaced by {target} in the span name.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 25, 2023
@trask trask removed the Stale label Apr 25, 2023
@estolfo
Copy link
Author

estolfo commented Apr 25, 2023

@trask What are the next steps for this? Have I answered all your questions?

@trask
Copy link
Member

trask commented Apr 25, 2023

@open-telemetry/opentelemetry-python-contrib-approvers @open-telemetry/dotnet-contrib-approvers @open-telemetry/java-instrumentation-approvers any thoughts on this, since it affects existing instrumentation in these languages? thx

@estolfo
Copy link
Author

estolfo commented Apr 28, 2023

@trask We will bring it up at the Python, Java, and .NET SIGs next week if we don't hear from the teams before then.

Update: because the .NET instrumentation is native to the Elasitcsearch client itself, I checked with the maintainer of that project and he said that: he can't say with certainty that he can get the information without a reasonable amount of work in the client, but that he suspects it should be possible to add most if not all of these onto the .NET spans in a way that is backwards compatible.

@estolfo
Copy link
Author

estolfo commented May 8, 2023

@trask
The Python group said they have no concerns about using these semantic conventions. They don't consider their consent a blocker and aren't concerned about backwards compatibility. They don't consider the instrumentation stable so are open to changing it.

[Not relevant to the discussion in this PR] For what it's worth, they said they'd rather have the instrumentation done natively in the Elasticsearch python client, rather than in the contrib repo.

@reyang
Copy link
Member

reyang commented May 9, 2023

@estolfo heads up - most likely this PR will be closed, and we'll ask you to resubmit the PR in a new repo, please refer to #3474 (comment).

@estolfo
Copy link
Author

estolfo commented May 9, 2023

@reyang Sure, no problem, thanks for letting me know. I see that the repo isn't created yet so I'll open a PR when it's available.

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@estolfo
Copy link
Author

estolfo commented May 15, 2023

Opened new PR here

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 23, 2023
tag: call-level-tech-specific
brief: >
The query params of the request, as a json string.
examples: [ '"{\"q\":\"test\"}", "{\"refresh\":true}"' ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
examples: [ '"{\"q\":\"test\"}", "{\"refresh\":true}"' ]
examples: [ 'q=test&refresh=true' ]

@github-actions
Copy link

github-actions bot commented Jun 3, 2023

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions semconv:database spec:trace Related to the specification/trace directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add semantic conventions for Elasticsearch client instrumentation
7 participants