Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for vector db semantic convention #1231

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
fa8ee30
Proposal for vector db semantic convention
ezimuel Jul 10, 2024
7068720
Merge + applied feedbacks #1231
ezimuel Jul 17, 2024
5e12a86
Removed allow_custom_values: true in db.yaml
ezimuel Jul 17, 2024
3b61784
Fixed merge
ezimuel Jul 18, 2024
828bacc
Merge branch 'main' into vector-db
ezimuel Jul 20, 2024
53d82d4
Merge remote-tracking branch 'upstream/main' into vector-db
ezimuel Aug 5, 2024
a3330ff
Updated dimension_count and similarity_metric
ezimuel Aug 5, 2024
e5ff387
Merge remote-tracking branch 'origin/vector-db' into vector-db
ezimuel Aug 5, 2024
da6649b
Merge branch 'main' into vector-db
ezimuel Aug 7, 2024
d99ec10
Fix array attribute examples (#1325)
lmolkova Aug 8, 2024
61b0f2c
Add k8s.{pod,node}.cpu.{time,usage} metrics (#1320)
ChrsMark Aug 11, 2024
ceae2ca
Db metrics pending requests (#1290)
maryliag Aug 12, 2024
6db7ec5
Fix `process.args_count` attribute (#1331)
lmolkova Aug 12, 2024
e5e0d9d
Add k8s.volume.{name,type} attributes (#1251)
ChrsMark Aug 14, 2024
ae0e066
Add tests for rego policies (#1334)
MadVikingGod Aug 14, 2024
03b67bf
add `nodejs.eventloop.time` metric (#1259)
maryliag Aug 15, 2024
93d2cbe
chore: Remove support for the event `fields` referencing/inheriting d…
MSNev Aug 18, 2024
f411554
Attempt to optimise attribute name collision checks. (#1328)
jsuereth Aug 19, 2024
daa0a14
(chore) Add dependabot config to keep tooling up to date. (#1346)
jsuereth Aug 19, 2024
bc8a63c
Fix broken docker link (#1332)
ChrsMark Aug 19, 2024
a5f8661
Bump markdownlint-cli from 0.31.0 to 0.41.0 (#1349)
dependabot[bot] Aug 19, 2024
d996cd9
Bump go.opentelemetry.io/build-tools/chloggen from 0.12.0 to 0.14.0 i…
dependabot[bot] Aug 19, 2024
a10e75f
Bump gulp from 4.0.2 to 5.0.0 (#1348)
dependabot[bot] Aug 19, 2024
fd0f2e7
Fix link anchors (#1354)
lmolkova Aug 19, 2024
1c6bd00
chore: update ids (#1352)
maryliag Aug 20, 2024
9feb74d
Removed db.vector.id and added db.record.id, renamed db.vector.field_…
ezimuel Aug 20, 2024
2357766
Merge branch 'main' into vector-db
ezimuel Aug 20, 2024
81dca47
Merge from upstream/main
ezimuel Sep 25, 2024
ff03da1
Removed db.vector.model and moved db.vector.search.similarity_metric …
ezimuel Sep 25, 2024
523bcb9
Merge branch 'main' into vector-db
ezimuel Sep 30, 2024
fd891f6
Merge remote-tracking branch 'upstream/main' into vector-db
ezimuel Nov 5, 2024
bc2ddb1
Merge remote-tracking branch 'upstream/main' into vector-db
ezimuel Nov 5, 2024
fc90f3f
Merge branch 'vector-db' of github.com:ezimuel/semantic-conventions i…
ezimuel Nov 5, 2024
24cc812
Merge remote-tracking branch 'upstream/main' into vector-db
ezimuel Nov 18, 2024
3d50dc1
Merge remote-tracking branch 'upstream/main' into vector-db
ezimuel Dec 2, 2024
06a67d4
Added docs + spans
ezimuel Dec 2, 2024
949b198
Merge remote-tracking branch 'upstream/main' into vector-db
ezimuel Dec 3, 2024
a88ac32
Update model/database/spans.yaml
ezimuel Dec 5, 2024
df19d76
Update model/database/spans.yaml
ezimuel Dec 5, 2024
94c2ca1
Update model/database/spans.yaml
ezimuel Dec 5, 2024
210ecb9
Update model/database/spans.yaml
ezimuel Dec 5, 2024
765a4a8
Update model/database/spans.yaml
ezimuel Dec 5, 2024
ccd11f7
Merge remote-tracking branch 'upstream/main' into vector-db
ezimuel Dec 5, 2024
4def570
Merge branch 'vector-db' of github.com:ezimuel/semantic-conventions i…
ezimuel Dec 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 42 additions & 8 deletions docs/attributes-registry/db.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
- [Cassandra Attributes](#cassandra-attributes)
- [Azure Cosmos DB Attributes](#azure-cosmos-db-attributes)
- [Elasticsearch Attributes](#elasticsearch-attributes)
- [Search attributes](#search-attributes)
- [Db Vector Attributes](#db-vector-attributes)
- [Deprecated Database Attributes](#deprecated-database-attributes)
- [Deprecated Database Metrics](#deprecated-database-metrics)

Expand All @@ -28,9 +30,10 @@ This group defines the attributes used to describe telemetry in the context of d
| <a id="db-operation-parameter" href="#db-operation-parameter">`db.operation.parameter.<key>`</a> | string | A database operation parameter, with `<key>` being the parameter name, and the attribute value being a string representation of the parameter value. [5] | `someval`; `55` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-query-summary" href="#db-query-summary">`db.query.summary`</a> | string | Low cardinality representation of a database query text. [6] | `SELECT wuser_table`; `INSERT shipping_details SELECT orders`; `get user by id` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-query-text" href="#db-query-text">`db.query.text`</a> | string | The database query being executed. [7] | `SELECT * FROM wuser_table where username = ?`; `SET mykey ?` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-record-id" href="#db-record-id">`db.record.id`</a> | string | The ID of the record [8] | `1`; `5c56c793-69f3-4fbf-87e6-c4bf54c28c26` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-response-returned-rows" href="#db-response-returned-rows">`db.response.returned_rows`</a> | int | Number of rows returned by the operation. | `10`; `30`; `1000` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-response-status-code" href="#db-response-status-code">`db.response.status_code`</a> | string | Database response status code. [8] | `102`; `ORA-17002`; `08P01`; `404` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-system" href="#db-system">`db.system`</a> | string | The database management system (DBMS) product as identified by the client instrumentation. [9] | `other_sql`; `adabas`; `cache` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-response-status-code" href="#db-response-status-code">`db.response.status_code`</a> | string | Database response status code. [9] | `102`; `ORA-17002`; `08P01`; `404` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-system" href="#db-system">`db.system`</a> | string | The database management system (DBMS) product as identified by the client instrumentation. [10] | `other_sql`; `adabas`; `cache` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1] `db.collection.name`:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.

Expand Down Expand Up @@ -76,11 +79,13 @@ For batch operations, if the individual operations are known to have the same qu
Even though parameterized query text can potentially have sensitive data, by using a parameterized query the user is giving a strong signal that any sensitive data will be passed as parameter values, and the benefit to observability of capturing the static part of the query text by default outweighs the risk.
This attribute has stability level RELEASE CANDIDATE.

**[8] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
**[8] `db.record.id`:** This can be also the ID of the vector, in case of vector database.

**[9] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
This attribute has stability level RELEASE CANDIDATE.

**[9] `db.system`:** The actual DBMS may differ from the one identified by the client. For example, when using PostgreSQL client libraries to connect to a CockroachDB, the `db.system` is set to `postgresql` based on the instrumentation's best knowledge.
**[10] `db.system`:** The actual DBMS may differ from the one identified by the client. For example, when using PostgreSQL client libraries to connect to a CockroachDB, the `db.system` is set to `postgresql` based on the instrumentation's best knowledge.
This attribute has stability level RELEASE CANDIDATE.

---
Expand Down Expand Up @@ -193,12 +198,12 @@ This group defines attributes for Azure Cosmos DB.
| <a id="db-cosmosdb-client-id" href="#db-cosmosdb-client-id">`db.cosmosdb.client_id`</a> | string | Unique Cosmos client instance id. | `3ba4827d-4422-483f-b59f-85b74211c11d` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-cosmosdb-connection-mode" href="#db-cosmosdb-connection-mode">`db.cosmosdb.connection_mode`</a> | string | Cosmos client connection mode. | `gateway`; `direct` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-cosmosdb-consistency-level" href="#db-cosmosdb-consistency-level">`db.cosmosdb.consistency_level`</a> | string | Account or request [consistency level](https://learn.microsoft.com/azure/cosmos-db/consistency-levels). | `Eventual`; `ConsistentPrefix`; `BoundedStaleness`; `Strong`; `Session` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-cosmosdb-regions-contacted" href="#db-cosmosdb-regions-contacted">`db.cosmosdb.regions_contacted`</a> | string[] | List of regions contacted during operation in the order that they were contacted. If there is more than one region listed, it indicates that the operation was performed on multiple regions i.e. cross-regional call. [10] | `["North Central US", "Australia East", "Australia Southeast"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-cosmosdb-regions-contacted" href="#db-cosmosdb-regions-contacted">`db.cosmosdb.regions_contacted`</a> | string[] | List of regions contacted during operation in the order that they were contacted. If there is more than one region listed, it indicates that the operation was performed on multiple regions i.e. cross-regional call. [11] | `["North Central US", "Australia East", "Australia Southeast"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-cosmosdb-request-charge" href="#db-cosmosdb-request-charge">`db.cosmosdb.request_charge`</a> | double | Request units consumed for the operation. | `46.18`; `1.0` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-cosmosdb-request-content-length" href="#db-cosmosdb-request-content-length">`db.cosmosdb.request_content_length`</a> | int | Request payload size in bytes. | | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-cosmosdb-sub-status-code" href="#db-cosmosdb-sub-status-code">`db.cosmosdb.sub_status_code`</a> | int | Cosmos DB sub status code. | `1000`; `1002` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[10] `db.cosmosdb.regions_contacted`:** Region name matches the format of `displayName` in [Azure Location API](https://learn.microsoft.com/rest/api/subscription/subscriptions/list-locations?view=rest-subscription-2021-10-01&tabs=HTTP#location)
**[11] `db.cosmosdb.regions_contacted`:** Region name matches the format of `displayName` in [Azure Location API](https://learn.microsoft.com/rest/api/subscription/subscriptions/list-locations?view=rest-subscription-2021-10-01&tabs=HTTP#location)

---

Expand Down Expand Up @@ -228,9 +233,38 @@ This group defines attributes for Elasticsearch.
| Attribute | Type | Description | Examples | Stability |
|---|---|---|---|---|
| <a id="db-elasticsearch-node-name" href="#db-elasticsearch-node-name">`db.elasticsearch.node.name`</a> | string | Represents the human-readable identifier of the node/instance to which a request was routed. | `instance-0000000001` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-elasticsearch-path-parts" href="#db-elasticsearch-path-parts">`db.elasticsearch.path_parts.<key>`</a> | string | A dynamic value in the url path. [11] | `db.elasticsearch.path_parts.index=test-index`; `db.elasticsearch.path_parts.doc_id=123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-elasticsearch-path-parts" href="#db-elasticsearch-path-parts">`db.elasticsearch.path_parts.<key>`</a> | string | A dynamic value in the url path. [12] | `db.elasticsearch.path_parts.index=test-index`; `db.elasticsearch.path_parts.doc_id=123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[12] `db.elasticsearch.path_parts`:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.<key>`, where `<key>` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names.

## Search attributes

This group defines attributes for Search.

| Attribute | Type | Description | Examples | Stability |
|---|---|---|---|---|
| <a id="db-search-similarity-metric" href="#db-search-similarity-metric">`db.search.similarity_metric`</a> | string | The metric used in similarity search. | `cosine` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

---

**[11] `db.elasticsearch.path_parts`:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.<key>`, where `<key>` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names.
`db.search.similarity_metric` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The similarity metric doesn't need to be defined per search request. It can also be defined when the collection is created.


| Value | Description | Stability |
|---|---|---|
| `cosine` | The cosine metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  1. There are cosine similarity and cosine distance. Better to distinguish the two in the convention.
  2. There are a few common ones that are not listed here: Squared Euclidean and hamming.

| `dot` | The dot product metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `euclidean` | The euclidean distance metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `manhattan` | The Manhattan distance metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

## Db Vector Attributes

This group defines attributes for vector databases.

| Attribute | Type | Description | Examples | Stability |
|---|---|---|---|---|
| <a id="db-vector-dimension-count" href="#db-vector-dimension-count">`db.vector.dimension_count`</a> | int | The dimension of the vector. | `3` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: db-vector-dimension-count -> db-vector-dimension.

| <a id="db-vector-field-name" href="#db-vector-field-name">`db.vector.field_name`</a> | string | The name field as of the vector (e.g. a field name). | `vector` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="db-vector-query-top-k" href="#db-vector-query-top-k">`db.vector.query.top_k`</a> | int | The top-k most similar vectors returned by a query. | `5` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

## Deprecated Database Attributes

Expand Down
1 change: 1 addition & 0 deletions docs/database/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,6 @@ Technology specific semantic conventions are defined for the following databases
* [MSSQL](mssql.md): Semantic Conventions for *MSSQL*.
* [Redis](redis.md): Semantic Conventions for *Redis*.
* [SQL](sql.md): Semantic Conventions for *SQL* databases.
* [Vector DB](vector.md): Semantic Conventions for *Vector* databases.

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
Loading