Database general-purpose target attribute #521

alanwest · 2023-11-14T19:51:16Z

Proposal for new `db.target` attribute

The type of targets available for a database operation is dependent on a database system. For example, the target of a SQL database operation may be a table, function, or stored procedure. Other database systems may refer to their targets as collections or indexes, etc.

Currently, there exists database system specific attributes for describing this. Examples include:

Introducing a general-purpose attribute would enable backends to provide a generic experience for faceting database operations by their target without requiring the backend have knowledge of all the database system specific attributes. It would also enable end users to more easily explore their data at a high-level without having to know the specific database systems in use.

I propose a new call-level attribute db.target for describing the target of a database operation. The term "target" may not be the greatest, but I'm using it for lack of a better term for the moment. I considered db.operation.target but of course db.operation is already an attribute. Maybe db.operation could become db.operation.type thereby allowing db.operation.target.

Issues to consider

Similar to db.operation, the value of the db.target may not be readily available without parsing the value of db.statement, e.g. SELECT * FROM Users the db.operation is SELECT and the db.target is Users. The specification advises against parsing db.statement to derive other attributes. I consider this a limitation particularly once we have a database client operation metric (see Database client operation duration metric #512). The db.statement attribute is fine for spans but because of its high-cardinality will not work for metrics. Therefore, having db.operation and db.target attributes on metrics will be useful.
There may be many targets for a single database operation, e.g. SELECT * FROM Users U JOIN Company C ON ....

Perhaps parsing db.statement can be an opt-in feature?

Various backends have prior art for addressing these issues. For example:

Elastic APM performs a limited parsing to extract the operation and outer-most table name from a SQL statement. This is described in their agent specifications.
New Relic APM agents perform a similar limited parsing and includes operation/target on metric data. New Relic's agent specs are unfortunately not publicly available, so I do not have a link to share. Though here is New Relic Java agent source code that performs the parsing.
New Relic agents offer a feature for providing a "hint" of the database target. The hint can be added as a comment prior to the database statement to simplify parsing, e.g., /* nrhint:Company */ SELECT * FROM Users U JOIN Company C ON ....

The text was updated successfully, but these errors were encountered:

pyohannes · 2023-11-15T15:04:50Z

In the messaging space we have a similar scenario, there we settled on the more or less generic term destination, which covers both queues and topics.

We also ended up making messaging.destination a namespace. There's messaging.destination.name, and then there are further attributes to describe the destination. Maybe something similar makes sense if one decides to go with the db.target approach.

mateuszrzeszutek · 2023-11-15T15:40:19Z

Perhaps parsing db.statement can be an opt-in feature?

FWIW the javaagent already does that. We have a query sanitization feature that replaces all the query variables with ?. It uses a performant tokenizer that scans the query looking for the interesting parts: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api-semconv/src/main/jflex/SqlSanitizer.jflex
Since we're already iterating over the query once, we figured we might as well try to extract the table name while we're at it.

jcocchi · 2024-02-09T04:18:06Z

Currently, there exists database system specific attributes for describing this. Examples include:

Cosmos DB also has a custom attribute for this called db.cosmosdb.container. Standardizing on a term for the semantic convention makes sense to me.

Perhaps parsing db.statement can be an opt-in feature?

Wouldn't the source library be responsible for populating the db.target, so each library would presumably only need to know how to parse its own db.statement (vs. the APM tool doing it)? Cosmos DB currently doesn't populate db.statement at all, and it doesn't apply to most of our operations.

dicko2 · 2024-02-11T02:06:11Z

This is one of our biggest barriers from moving from app insights to otel. App insights includes the whole statement, we use mainly stored procs though, and where we don't its parameterised query strings. It would cause cardinality issues if people are injecting values into the query. But I think people shouldn't be doing this anyway.

I don't see the value of the target begin the table alone. Think about what you need to debug a problem. If you put only a table then these two queries will be grouped together

SELECT * FROM users

SELECT name FROM users WHERE userid = @userid

They will have drastically different performance right? So why group them?

I think adding the whole query is the way to go imo.

trask · 2024-02-13T22:30:24Z

It would cause cardinality issues if people are injecting values into the query. But I think people shouldn't be doing this anyway.

I believe even properly parameterized statements are too high-cardinality for metrics, e.g. it's not uncommon to have 100's of tables

dicko2 · 2024-02-14T00:06:39Z

If you don't have the query. I would question the use of the metric. What are you measuring?

trask · 2024-02-16T16:37:20Z

Naming idea from semconv meeting: db.entity

Related: "entity" term may be used for identifying resource attributes

jack-berg · 2024-02-16T22:09:23Z

Compiling the query syntax docs from a sampling of the different db systems we reference:

In postgres you select from a from_item
In mysql you select from table_references
In redis you search an index
In mongodb you query a collection
In DynamoDb you query a TableName
In Cassandra you query a table_name
In Elasticsearch you query against an index
In cosmosdb you query a container

db.target seems like a decent choice.

I also like db.collection, since it implies a collections of rows, columns, records, documents, entries in an index, etc.

dicko2 · 2024-02-18T00:27:46Z

You could also be calling a stored procedure too

joaopgrassi · 2024-02-20T17:03:35Z

If you don't have the query. I would question the use of the metric. What are you measuring?

One could measure all db.operation in a certain db.target - say "Give me a count of all SELECT in target MYTABLE. Or "Give me a count of all EXEC in target MY_SPROC". If that's useful, that's another question.

The statement in a metric makes not much sense, but makes total sense in spans.

lmolkova · 2024-02-21T02:37:38Z

Assuming I do cross-table query, would db.operation and db.target be arrays?
Also, it'd be great to have it as opt-in on metrics, but array attribute on metrics is questionable.

I.e.

the default experience (for SQL) is not super-helpful (no table name, no statement, no operation)
opt-in statement parsing can add db.target and db.operation, but it's still far from perfect, especially if there is more than one operation/target

An 'ideal' solution as I see it would be if users could optionally give queries a nick-name which would appear on spans and metrics.
We don't have a common way to achieve it, but if there was one and we could get this nick-name, we would put it into attribute like db.query.name.

The propagation mechanism could look like this:

Context dbScope = Context.current();
try(Scope scope = dbScope.with(ContextKey.named("db.query.name"), "get user by id").makeCurrent()) {
    // SELECT name FROM users WHERE userid = @userid
}

It could be fragile in some languages/cases and probably needs a spec change to make it possible in all languages.

What we can do now is to define such attribute, add it to metrics as conditionally_required: when available (even though it never is). This way semconv can go stable, default experience would not be great, but we'll have means in the semconv to create a good one.

We have similar cases (messaging.destination.template, http.route on client)

lmolkova · 2024-02-21T02:51:10Z

Another option could be that:
assuming db.statement can be templated, user can opt-in into collecting a limited number of db.statement on metrics.

We could potentially extend metrics advisory params api to configure an explicit allow-list for db.statement values.

this way cardinality would be fully in the user hands.

trask · 2024-04-24T16:25:09Z

I think this is resolved by #870?

github-actions bot assigned joaopgrassi Nov 14, 2023

trask added this to Database Client Semantic Conventions Feb 7, 2024

trask mentioned this issue Feb 8, 2024

db.name should be broken down into individual layers #714

Closed

jcocchi mentioned this issue Feb 9, 2024

Should db.sql.table include db.name? #718

Closed

This was referenced Mar 10, 2024

How to record multi-operation/table/dbs operations on DB metrics #805

Open

Propose Context-scoped attributes. open-telemetry/oteps#207

Closed

trask closed this as completed Apr 24, 2024

github-project-automation bot moved this to Done in Database Client Semantic Conventions Apr 24, 2024

lmolkova mentioned this issue Oct 15, 2024

DB: fix how operation and collection names are recorded for complex queries (opt-in db.query.text on metrics, new db.query.summary recommended attribute) #1482

Merged

3 tasks

lmolkova mentioned this issue Oct 25, 2024

[database] reporting stored procedures/prepared statements #1491

Open

lmolkova reopened this Oct 29, 2024

lmolkova mentioned this issue Oct 29, 2024

Rename db.collection.name to db.target.name #1527

Closed

3 tasks

joaopgrassi removed their assignment Oct 29, 2024

joaopgrassi added the area:db label Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database general-purpose target attribute #521

Database general-purpose target attribute #521

alanwest commented Nov 14, 2023 •

edited by trask

Loading

pyohannes commented Nov 15, 2023

mateuszrzeszutek commented Nov 15, 2023

jcocchi commented Feb 9, 2024

dicko2 commented Feb 11, 2024

trask commented Feb 13, 2024

dicko2 commented Feb 14, 2024

trask commented Feb 16, 2024 •

edited

Loading

jack-berg commented Feb 16, 2024 •

edited

Loading

dicko2 commented Feb 18, 2024

joaopgrassi commented Feb 20, 2024 •

edited

Loading

lmolkova commented Feb 21, 2024 •

edited

Loading

lmolkova commented Feb 21, 2024 •

edited

Loading

trask commented Apr 24, 2024

Database general-purpose target attribute #521

Database general-purpose target attribute #521

Comments

alanwest commented Nov 14, 2023 • edited by trask Loading

Proposal for new db.target attribute

Issues to consider

pyohannes commented Nov 15, 2023

mateuszrzeszutek commented Nov 15, 2023

jcocchi commented Feb 9, 2024

dicko2 commented Feb 11, 2024

trask commented Feb 13, 2024

dicko2 commented Feb 14, 2024

trask commented Feb 16, 2024 • edited Loading

jack-berg commented Feb 16, 2024 • edited Loading

dicko2 commented Feb 18, 2024

joaopgrassi commented Feb 20, 2024 • edited Loading

lmolkova commented Feb 21, 2024 • edited Loading

lmolkova commented Feb 21, 2024 • edited Loading

trask commented Apr 24, 2024

alanwest commented Nov 14, 2023 •

edited by trask

Loading

Proposal for new `db.target` attribute

trask commented Feb 16, 2024 •

edited

Loading

jack-berg commented Feb 16, 2024 •

edited

Loading

joaopgrassi commented Feb 20, 2024 •

edited

Loading

lmolkova commented Feb 21, 2024 •

edited

Loading

lmolkova commented Feb 21, 2024 •

edited

Loading