Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add authentication sub-namespace to user #1146

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .chloggen/add_authentication_user_subnamespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: user

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: introduce subnamespace `user.authentication` with a new attribute `user.authentication.id`

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [1104]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: Update `identity` attributes under general attribute doc.
27 changes: 17 additions & 10 deletions docs/attributes-registry/user.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,20 @@

Describes information about the user.

| Attribute | Type | Description | Examples | Stability |
| ---------------- | -------- | ---------------------------------------------------------------------------- | -------------------------------------------------- | ---------------------------------------------------------------- |
| `user.email` | string | User email address. | `[email protected]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.full_name` | string | User's full name | `Albert Einstein` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.hash` | string | Unique user hash to correlate information for a user in anonymized form. [1] | `364fc68eaf4c8acec74a4e52d7d1feaa` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.id` | string | Unique identifier of the user. | `S-1-5-21-202424912787-2692429404-2351956786-1000` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.name` | string | Short name or login/username of the user. | `a.einstein` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.roles` | string[] | Array of user roles at the time of the event. | `["admin", "reporting_user"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1]:** Useful if `user.id` or `user.name` contain confidential information and cannot be used.
| Attribute | Type | Description | Examples | Stability |
| ------------------------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- | ---------------------------------------------------------------- |
| `user.authentication.id` | string | Unique identifier of an authenticated user in the system. [1] | `S-1-5-21-202424912787-2692429404-2351956786-1000` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.email` | string | User email address. | `[email protected]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.full_name` | string | User's full name | `Albert Einstein` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.hash` | string | Unique user hash to correlate information for a user in anonymized form. [2] | `364fc68eaf4c8acec74a4e52d7d1feaa` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.id` | string | Identifies a user interacting with a system regardless of user authentication status. This identifier may be unique only through best-effort means. [3] | `QdH5CAWJgqVT4rOr0qtumf` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.name` | string | Short name or login/username of the user. | `a.einstein` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `user.roles` | string[] | Array of user roles at the time of the event. | `["admin", "reporting_user"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1]:** The `user.authentication.id` MAY be used to identify a user attempting to authenticate if it's known at this stage.

**[2]:** Useful if `user.authentication.id` or `user.name` contain confidential information and cannot be used.

**[3]:** The `user.id`, when populated, is expected to be generated before user is authenticated and SHOULD NOT change after the user logs in. In browser scenarios `user.id` is usually stored in cookies.
It's NOT RECOMMENDED to populate this attribute when unauthenticated users are not tracked or identified by the system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this line, OpenTelemetry should not be making these statement.

As it's up to the application to determine whether or not they want to track the user.id when the user in authenticated. For client applications this is almost ALWAYS required as they will use this to identify the "user" in some random way

Copy link
Contributor

@lmolkova lmolkova Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

we want authenticated user id to always be in the same attribute user.authentication.id. If you don't have means to track unauthenticated users, you should never populate user.id.

Otherwise how you see people using those attributes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only authenticated user id can be tracked through user.authentication.id and everything else is tracked through 'user.id', it can be anonymous, random, pseudo, unauthenticated, unauthorized and etc. can we not make any recommendation how OpenTelemetry customers use this attribute? as long as we have a crystal-clear description, it's up to the users to decide. does it help?

Copy link
Contributor

@lmolkova lmolkova Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're recommending to use user.id only if app has some means to identify them (without authentication),
Assuming app has no means to identify users before authentication, user.id should not be used.

App can identify them by generating guid on every call, that's also a way of identification/tracking.

So from what I can tell this NOT RECOMMENDED just reinforces the description and does not contradict anything you're saying.

I want to keep this sentence since I expect people to put things they don't need into user.id and want to do as much as possible to avoid it. What's the problem in keeping it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this; your argument seems valid. let's try to resolve it in Monday's SIG with @MSNev and everyone else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m finding these names a bit confusing, and agree with this:

I expect people to put things they don't need into user.id and want to do as much as possible to avoid it

It seems that unauthenticated/tracking/pseudo user ids are fairly specific to client (browser/mobile) instrumentation, so maybe it’s better to have a more specific name for that concept (and not use user.id for it)?

It can be a random guid or a hash of the user's IP address. This is different from `user.hash` which is a hash of a known `user.id` or `user.name`.
heyams marked this conversation as resolved.
Show resolved Hide resolved
16 changes: 13 additions & 3 deletions docs/general/attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -409,9 +409,19 @@ These attributes may be used for any operation with an authenticated and/or auth

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`enduser.id`](/docs/attributes-registry/enduser.md) | string | Deprecated, use `user.id` instead. | `username` | `Recommended` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)<br>Replaced by `user.id` attribute. |
| [`enduser.role`](/docs/attributes-registry/enduser.md) | string | Deprecated, use `user.roles` instead. | `admin` | `Recommended` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)<br>Replaced by `user.roles` attribute. |
| [`enduser.scope`](/docs/attributes-registry/enduser.md) | string | Deprecated, no replacement at this time. | `read:message, write:files` | `Recommended` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)<br>Removed. |
| [`user.id`](/docs/attributes-registry/user.md) | string | Identifies a user interacting with a system regardless of user authentication status. This identifier may be unique only through best-effort means. [1] | `QdH5CAWJgqVT4rOr0qtumf` | `Conditionally Required` [2] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`user.authentication.id`](/docs/attributes-registry/user.md) | string | Unique identifier of an authenticated user in the system. [3] | `S-1-5-21-202424912787-2692429404-2351956786-1000` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`user.roles`](/docs/attributes-registry/user.md) | string[] | Array of user roles at the time of the event. | `["admin", "reporting_user"]` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1]:** The `user.id`, when populated, is expected to be generated before user is authenticated and SHOULD NOT change after the user logs in. In browser scenarios `user.id` is usually stored in cookies.
It's NOT RECOMMENDED to populate this attribute when unauthenticated users are not tracked or identified by the system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again remove

It can be a random guid or a hash of the user's IP address. This is different from `user.hash` which is a hash of a known `user.id` or `user.name`.

**[2]:** If instrumentation supports tracking unauthenticated users and if `user.authentication.id` is set, recommended otherwise.

**[3]:** The `user.authentication.id` MAY be used to identify a user attempting to authenticate if it's known at this stage.




<!-- markdownlint-restore -->
Expand Down
13 changes: 7 additions & 6 deletions model/general.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,16 @@ groups:
- ref: peer.service
requirement_level: recommended
- id: identity
type: span
type: attribute_group
brief: >
These attributes may be used for any operation with an authenticated and/or authorized enduser.
These attributes may be used for any operation with an authenticated and/or authorized user.
attributes:
- ref: enduser.id
requirement_level: recommended
- ref: enduser.role
- ref: user.id
requirement_level:
conditionally_required: If instrumentation supports tracking unauthenticated users and if `user.authentication.id` is set, recommended otherwise.
Copy link
Contributor

@lmolkova lmolkova Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I think I got it wrong in my previous suggestion.

We don't want user.id to be used for authenticated users, do we? so I guess it should be

Suggested change
conditionally_required: If instrumentation supports tracking unauthenticated users and if `user.authentication.id` is set, recommended otherwise.
conditionally_required: If and only if instrumentation supports tracking unauthenticated users.

wdyt?

Copy link
Contributor

@lmolkova lmolkova Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the goal is to prevent someone from duplicating things (or using user.id for authenticated one) like

❌ Bad:

user.authentication.id = lmolkova
user.id = lmolkova

❌ Bad:

user.id = lmolkova

we want people to do

✅ Good:

user.id = QdH5CAWJgqVT4rOr0qtumf

✅ Good:

user.id = QdH5CAWJgqVT4rOr0qtumf
user.authentication.id = lmolkova

✅ Good:

user.authentication.id = lmolkova

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another way to same this is that this value SHOULD NOT be identifying in anyway to the real user id. So this value MUST not contain PII which can directly identify the authenticated user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PII is once concern, but my main point is is avoid duplication (you should not generate new guid for the sake of populating user.id) and consistency - everyone should be using user.*.id in the same way

Copy link
Contributor Author

@heyams heyams Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for azure monitor javascript SDK, browser always has a cookie named ai_user=QdH5CAWJgqVT4rOr0qtumf.
this will be tracked via user.id regardless of whether user is authenticated or not. user can choose to use user.authentication.id for the real user's id/name.

can we make it not required and remove the condition completely?

Copy link
Contributor

@lmolkova lmolkova Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing something.

The condition here is if tracking unauthenticated users is supported, so azmon distro should set user.id based on it.

this condition says nothing about authentication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all i'm saying that user.id is not a required field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this work?

Suggested change
conditionally_required: If instrumentation supports tracking unauthenticated users and if `user.authentication.id` is set, recommended otherwise.
recommended: If and only if instrumentation supports tracking unauthenticated users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm saying that user.id is not a required field.

It might be not required for this specific use case. But for anything related to the users in OS user.id is well known and important field

- ref: user.authentication.id
requirement_level: recommended
- ref: enduser.scope
- ref: user.roles
heyams marked this conversation as resolved.
Show resolved Hide resolved
requirement_level: recommended
- id: thread
type: span
Expand Down
18 changes: 16 additions & 2 deletions model/registry/user.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,28 @@ groups:
brief: >
Unique user hash to correlate information for a user in anonymized form.
note: >
Useful if `user.id` or `user.name` contain confidential information and cannot be used.
Useful if `user.authentication.id` or `user.name` contain confidential information and cannot be used.
examples: ['364fc68eaf4c8acec74a4e52d7d1feaa']
- id: user.id
type: string
stability: experimental
brief: >
Unique identifier of the user.
Identifies a user interacting with a system regardless of user authentication status. This identifier may be unique only through best-effort means.
note: >
The `user.id`, when populated, is expected to be generated before user is authenticated and SHOULD NOT change after the user logs in.
In browser scenarios `user.id` is usually stored in cookies.

It's NOT RECOMMENDED to populate this attribute when unauthenticated users are not tracked or identified by the system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove


It can be a random guid or a hash of the user's IP address. This is different from `user.hash` which is a hash of a known `user.id` or `user.name`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, one more minor suggestion to rearrange things slightly (feel free to rephrase or not take it at all)

Suggested change
The `user.id`, when populated, is expected to be generated before user is authenticated and SHOULD NOT change after the user logs in.
In browser scenarios `user.id` is usually stored in cookies.
It's NOT RECOMMENDED to populate this attribute when unauthenticated users are not tracked or identified by the system.
It can be a random guid or a hash of the user's IP address. This is different from `user.hash` which is a hash of a known `user.id` or `user.name`.
The `user.id`, when populated, is expected to be generated before user is authenticated and SHOULD NOT change after the user logs in.
It's NOT RECOMMENDED to populate this attribute when unauthenticated users are not tracked or identified by the system.
In browser scenarios `user.id` is usually stored in cookies. It can also be a random guid or a hash of the user's IP address. This is different from `user.hash` which is a hash of a known `user.authentication.id` or `user.name`.

examples: ['QdH5CAWJgqVT4rOr0qtumf']
- id: user.authentication.id
type: string
brief: Unique identifier of an authenticated user in the system.
note: >
The `user.authentication.id` MAY be used to identify a user attempting to authenticate if it's known at this stage.
examples: ['S-1-5-21-202424912787-2692429404-2351956786-1000']
stability: experimental
- id: user.name
type: string
stability: experimental
Expand Down
Loading