Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Work_Item] (BUG) Ensure all user-defined Tags are encapsulated within the Tags column #540

Open
cnharris10 opened this issue Sep 11, 2024 · 22 comments
Assignees
Labels
1.2 Agreed scope for release 1.2 backward compatibility Potentially affects compatibility with past FOCUS releases bug in version A bug has been identified in one of the existing versions csp Cloud service providers dimensionality Fields that describe / group / filter metrics spec revision Revise existing definition to be clearer or more accurate work item Issues to be considered for spec development
Milestone

Comments

@cnharris10
Copy link
Contributor

cnharris10 commented Sep 11, 2024

1. Problem Statement *

What is the problem?: Explain the context and why it needs resolution.
Impact: Describe how the problem affects users, systems, or the project.

In a recent discussion with @AWS-ZachErdman, he mentioned an oversight within the Tags column that at least affects providers with 2+ user-defined tags systems. In this case, AWS (user-defined resource tags, cost categories) and GCP (tags, labels) are affected.

The Tags column currently says: Providers MUST NOT alter user-defined Tag keys or values. In cases where a provider has multiple user-defined tagging features that allow for the same user-defined tags to be created, but partitioned by feature, this will require at least N-1 user-defined features to require some prefix in order to prevent clobbering.

For example, AWS has user-defined both resource tags and user-defined cost categories. If a customer defines a user-defined resource tag as foo:bar and a cost category as foo:baz, then persisting both in the Tags column key/value map will cause clobbering (i.e. either "bar" or "baz" will persist, not both). The same case can occur between GCP tags and labels.

2. Objective *

State the objective of this work item. What outcome is expected?
Success Criteria: Define how success will be measured (e.g. metrics and KPIs).

All user-based or provider-based tags are encapsulated within the Tags column with predefined prefixes preventing clobbering for at least N-1 tagging schemes.

3. Supporting Documentation *

Include links to supporting documents such as:

  • Data Examples: [Link to data or relevant files; DO NOT share proprietary information]
  • Related Use Cases or Discussion Documents: [Link to discussion]
  • PRs or Other References: [Link to relevant references]

Original Tags column definition for FOCUS 1.0: #227
Use Case: Analyze cost and usage by multiple tag structures without guessing which columns contain various tags

4. Proposed Solution / Approach

Outline any proposed solutions, approaches, or potential paths forward. Do not submit detailed solutions; please keep suggestions high-level.

Initial Ideas: Describe potential solution paths, tools, or technologies.
Considerations: Include any constraints, dependencies, or risks.
Feasibility: Include any information that helps quantify feasibility, such as perceived level of effort to augment the spec, or existing fields in current data generator exports.
Benchmarks: Are there established best practices for solving this problem available to practitioners today (e.g. mappings from existing CSP exports that are widely used)?

In the proposed approach, using the AWS CUR as an example, the following tags are considered:

User-defined Tags:

  • Resource tag: foo:bar (i.e. resourceTags/user:foo with value bar)
  • Cost Category tag: foo:bar3 (i.e. costCategories/foo with value bar3)

Provider-defined Tag:

  • System-defined Tag: foo:bar2 (i.e. resourceTags/aws:foo with value bar2)

The proposal is to amend the Tags column to allow a user-defined prefix to be concatenated with a finalized user-defined tag key for N-1 user-defined tagging schemes. This allows for 1 tagging scheme to remain without a user-defined prefix, so practitioners can reference a user-defined tagging schema without a prefix.

With the tags supplied above, all Tags can be co-located as either:

Option 1: Predefined prefix declared for N-1 user-defined and all provider tags
Provider declares prefix: costCategories for user-defined cost category tags and aws for provider-defined system tags.

Tags: { "foo": "bar", "aws:foo": "bar2", "costCategories:foo": "bar3" }

Option 2: Prefix declared for all user-defined and all provider tags
Provider declares prefix user for user-defined resource tags, prefix: costCategories for user-defined cost category tags, and aws for provider-defined system tags.

Tags: { "user:foo": "bar", "aws:foo": "bar2", "costCategories:foo": "bar3" }

5. Epic or Theme Association

This section will be completed by the Maintainers.

Epic: [Epic Name]
Theme: [Theme Name, if applicable]

TBD

6. Stakeholders *

List the main stakeholders for this issue.

Primary Stakeholders: [Name/Role]
Other Involved Parties: [Names/Roles]

TBD

@cnharris10 cnharris10 added the discussion topic Item or question to be discussed by the community label Sep 11, 2024
@github-project-automation github-project-automation bot moved this to Triage in FOCUS WG Sep 11, 2024
@shawnalpay shawnalpay added dimensionality Fields that describe / group / filter metrics spec revision Revise existing definition to be clearer or more accurate labels Oct 10, 2024
@cnharris10 cnharris10 added the 1.2 consideration To be considered for release 1.2 label Oct 11, 2024
@shawnalpay shawnalpay added backward compatibility Potentially affects compatibility with past FOCUS releases needs work item Needs an issue that adheres to the Work Item issue template, prior to consideration by stakeholders labels Oct 16, 2024
@jpradocueva
Copy link
Contributor

Summary TF-2 call on Oct 16:

#540 [DISCUSSION]: Tags Column Definition and User-Defined Tags
Key Discussion Items: The discussion focused on the requirement that user-defined tags cannot be altered, which could lead to issues with normalization and denormalization.
Problem Identification: Some cloud providers (AWS, specifically) have multiple tag schemes, leading to complications in enforcing a strict tag policy.
Divergent Views: The group debated whether changing the definition would introduce a breaking change.
Final Agreement: Chris will create a work item to formalize the issue, advocating for a potential change in the 1.2 release.
Action Items:
[TF-2-#540] Chris, @cnharris10 will handle creating the work item for this issue.

@cnharris10 cnharris10 removed the needs work item Needs an issue that adheres to the Work Item issue template, prior to consideration by stakeholders label Oct 17, 2024
@shawnalpay shawnalpay added work item Issues to be considered for spec development and removed discussion topic Item or question to be discussed by the community labels Oct 17, 2024
@shawnalpay
Copy link
Contributor

@cnharris10 Spent some time with this one. I get it now! But I have some feedback. :)

  • Like with my feedback on [Work_Item] Add support for coverage eligibility #406: now that this is a Work Item instead of a Discussion Topic: I recommend you change the title to be an action on a concept rather than a problem or solution statement. Something like Support provision of multiple user-defined Tag systems.
  • Section 1: at least one use case in this format would be helpful. Perhaps Analyze cost and usage by multiple tag structures.
  • Your AWS example of resource tags and cost categories: I think it would be really helpful to link to some docs that describe them. For example, here and here.
  • Your use of the word "clobbering" is accurate, of course -- but it's a technical term, and I didn't know what it meant upon first read; I'm sure many of our stakeholders will be a similar boat. Could you briefly define it upon first use, or link to a page that defines it? Same feedback for use of the term "N-1".
  • Linking to the Tags column definition in Section 1 would be helpful.
    • Also, the quote you pull from the Tags column (Providers MUST NOT alter user-defined Tag keys or values.) would be good to encapsulate with quotes or quote markdown or similar. It's currently not clear where the quote ends and your commentary begins.
  • Do you perceive there to be a way to implement this without making a material change to the column definition? If not, then it would be worth mentioning in the write-up that this would result in a change to the composition of this column.
  • I think you mentioned in a call this week that AWS implemented a way around this. Is that true? In Section 3, it would be a helpful reference to see how one or more providers are handling -- or NOT handling -- these issues.

If you feel this level of detail is unnecessary and/or I'm being pedantic, I can appreciate that -- but our audience for these issues is expanding beyond the FOCUS project team, and any/all context will be helpful for someone getting up to speed on this (even a dense Maintainer such as myself!).

@shawnalpay shawnalpay added the needs use case Needs a description of the why (use case or other problem to solve) label Oct 17, 2024
@jpradocueva
Copy link
Contributor

Summary from Members' call on Oct 17:

#540 [DISCUSSION]: Tags Column Definition Mandates that User-Defined Tags are Not Altered, Which Can Lead to Various Scenarios
Primary Issue: This discussion revolves around the current requirement in the specification that user-defined tags must not be altered. The concern is that this rule could lead to complications when practitioners deal with multiple user-defined tag schemes from different providers.
Core Problem: AWS, for instance, allows both user-defined resource tags and user-defined cost categories, which could result in conflicts when both types of tags share the same key names. The current specification does not adequately address how to differentiate these multiple tag schemes without altering the user-defined tags.
Divergent Views: Some members felt that allowing providers to prepend a prefix to user-defined tags could resolve the issue without altering the tags themselves, while others expressed concern that introducing prefixes would increase complexity and make tag management harder for practitioners. There was also debate about whether this could be considered a breaking change.
Final Agreement: The group agreed to explore solutions that would allow providers to prepend a prefix for certain user-defined tag schemes (e.g., cost categories) without altering other user-defined tags. This Issue #540 represents the first “work item” to prepare. However, this should be carefully reviewed to ensure that it doesn’t introduce complexity or conflicts for practitioners. This Issue #540 represents the first “work item” to be prepared by the group.
Action Items:

  • [Members-#540] Chris @cnharris10 : Draft a proposal for handling multiple user-defined tag schemes without altering tags, including the use of prefixes where appropriate.
  • [Members-#540] Chris @cnharris10 : Review the potential impact of this proposal to determine if it constitutes a breaking change.

@cnharris10 cnharris10 changed the title [DISCUSSION]: Tags column definition mandates that user-defined tags are not altered which can lead to various scenarios. [Work_Item]: [BUG] Ensure all user/provided-based Tags are encapsulated within the Tags column Oct 18, 2024
@cnharris10 cnharris10 changed the title [Work_Item]: [BUG] Ensure all user/provided-based Tags are encapsulated within the Tags column [Work_Item]: (BUG) Ensure all user/provided-based Tags are encapsulated within the Tags column Oct 18, 2024
@cnharris10 cnharris10 removed the needs use case Needs a description of the why (use case or other problem to solve) label Oct 20, 2024
@shawnalpay shawnalpay added the bug in version A bug has been identified in one of the existing versions label Oct 21, 2024
@shawnalpay shawnalpay changed the title [Work_Item]: (BUG) Ensure all user/provided-based Tags are encapsulated within the Tags column [Work_Item]: (BUG) Ensure all user-based Tags are encapsulated within the Tags column Oct 21, 2024
@shawnalpay shawnalpay changed the title [Work_Item]: (BUG) Ensure all user-based Tags are encapsulated within the Tags column [Work_Item]: (BUG) Ensure all user-defined Tags are encapsulated within the Tags column Oct 21, 2024
@thecloudman
Copy link
Contributor

I use GCP and Azure and in GCP we have labels and tags, in both labels and tags we have some matching keys. In our FOCUS dataset we dont have any issues in showing the key values from both labels and tags. Might need to do some more investigation into this one.

@shawnalpay
Copy link
Contributor

@thecloudman Interesting; thanks for sharing.

@cnharris10 @AWS-ZachErdman Do we have real-world examples of this happening, and if so, could you share? It may be difficult to get the stakeholders to prioritize this one if it's not perceived to be a problem.

@shawnalpay shawnalpay added the needs examples Needs data to illustrate the issue label Oct 23, 2024
@cnharris10
Copy link
Contributor Author

cnharris10 commented Oct 23, 2024

@thecloudman

A couple questions:

  1. For GCP exports, are you saying that when you have a tag, foo:bar, and a label: foo:baz, you are fine with (non-deterministically) the Tags column manifesting as either {"foo": "bar"} or {"foo": "baz"} and losing the other entry?

  2. To mitigate this clobbering issue, AWS supplies user-defined resource tags within the Tags column and also creates a provider-defined column, AWS_CostCategories, that encapsulates their other user-defined (Cost Category) tags. This ensures that the example from the previous question doesn't occur.

If providers follow this approach, then providers will encapsulate some user-defined tags under the standard Tags column and the rest under 1 or more provider-based columns (ex: x_MyOtherTags). In this case, with 3 hypothetical providers going this route (Provider1, Provider2, Provider3), 4 columns will be produced causing practitioners to look/query across various, non-normalized columns for user-defined tags.

Example:

  • Tags: { "foo": "bar" }
  • x_Provider1_OtherUserDefinedTags: { "foo": "bar2" }
  • x_Provider2_OtherUserDefinedTags: { "foo": "bar3" }
  • x_Provider3_OtherUserDefinedTags: { "foo": "bar4" }

The intent of the Tags column for 1.0 was to encapsulate all tags under one column to allow an easy querying experience regardless of provider

@rileyjenk
Copy link
Contributor

I just tested this with our own data and there is potential collision if their a multiple mechanisms that are resulting in keys that are the same. In the event that the provider has a multiple systems that provide key and values in the tags column then they either need to:

  • (Provider) prevent collisions on entry (pretty large scope here but could be possible)
  • (Provider) pick a winner (problematic)
  • (Spec) As is above the spec allows for keys to be prepending with a context of the key to prevent collisions.

@udam-f2
Copy link
Contributor

udam-f2 commented Oct 23, 2024

I also see the need for this.

The spec allowing for namespacing to avoid these collisions seems like the preferable approach here.

@cnharris10 cnharris10 removed the needs examples Needs data to illustrate the issue label Oct 24, 2024
@shawnalpay shawnalpay changed the title [Work_Item]: (BUG) Ensure all user-defined Tags are encapsulated within the Tags column [Work_Item] (BUG) Ensure all user-defined Tags are encapsulated within the Tags column Oct 24, 2024
@ijurica
Copy link
Contributor

ijurica commented Oct 25, 2024

An oversight in the specification, and we need to resolve it.

@AWS-ZachErdman
Copy link
Contributor

@cnharris10 this is mainly a problem with respect to cost categories having it's own column and should not be related to the gap that we listed in our user guide for our preview specification.

The most compelling problem explanation and argument for me about why we should reconsider this definition is the argument you gave here:

If providers follow this approach, then providers will encapsulate some user-defined tags under the standard Tags column and the rest under 1 or more provider-based columns (ex: x_MyOtherTags). In this case, with 3 hypothetical providers going this route (Provider1, Provider2, Provider3), 4 columns will be produced causing practitioners to look/query across various, non-normalized columns for user-defined tags.

Example:

Tags: { "foo": "bar" }
x_Provider1_OtherUserDefinedTags: { "foo": "bar2" }
x_Provider2_OtherUserDefinedTags: { "foo": "bar3" }
x_Provider3_OtherUserDefinedTags: { "foo": "bar4" }
The intent of the Tags column for 1.0 was to encapsulate all tags under one column to allow an easy querying experience regardless of provider

@shawnalpay shawnalpay added the csp Cloud service providers label Oct 29, 2024
@jpradocueva
Copy link
Contributor

Notes from the Maintainers' call on November 4:

Context: Practitioners face challenges in tracking resources due to overlaps in tagging structures across providers. This work item seeks to address inconsistencies and potential conflicts in tags by standardizing guidance on user-defined tags.
Level of Effort Required: Low — Addressing the tagging structure is manageable but may require coordination with providers to align on consistent tagging practices.

@shawnalpay shawnalpay added the 1.2 Agreed scope for release 1.2 label Nov 18, 2024
@shawnalpay shawnalpay added this to the v1.2 milestone Nov 25, 2024
@jpradocueva
Copy link
Contributor

Summary from the Maintainers' call on Nov 25

Context:
This item ensures that all user-defined tags are captured within a specific dataset field, maintaining consistency and simplifying data parsing.
Maintainers Assigned:
Zach, Chris, Riley, Tim Wright
Task Force Assigned:
Task Force 1 (TF1).

@daviddinhgcp
Copy link

We don't have this problem in GCP as label keys and tag keys must be unique. In the billing data, tags and labels have separate columns or fields. Although, one could have the same key for labels and tags and assigned to the same resource. It would be accurately reflected in the respective columns. GCP also doesn't have a particular set of schema for tags and labels. GCP aslo uses system labels, these are auto generated metadata that applies to certain resources. System labels are also reported in separate columns.

@shawnalpay shawnalpay removed the 1.2 consideration To be considered for release 1.2 label Nov 27, 2024
@cnharris10
Copy link
Contributor Author

We don't have this problem in GCP as label keys and tag keys must be unique. In the billing data, tags and labels have separate columns or fields. Although, one could have the same key for labels and tags and assigned to the same resource. It would be accurately reflected in the respective columns. GCP also doesn't have a particular set of schema for tags and labels. GCP aslo uses system labels, these are auto generated metadata that applies to certain resources. System labels are also reported in separate columns.

Thanks @daviddinhgcp for the added context. For tags/labels, since keys can be the same and Tags is effectively a Map data type, we'll need a way to differentiate - likely allowing prefixes for user-generated k/v pairs.

For system labels, these are "provider" tags, and the guidance is to add a provider-provided prefix to these k/v pairs to ensure no clobbering

@thecloudman
Copy link
Contributor

Another option - organisations who leverage 2+ tagging schemes should have a tagging standard which defines the key values for each type of tag. For example in Google, they have labels and tags that can be user defined. Labels are the same as tags in azure but tags in gcp are used for automation or assigning policies etc to resources. Therefore, an organisation should have a tagging standard that defines the difference between the two types and clearly outlines the requirements of the two types. Providers must not prefix user defined tags or labels. Providers Must prefix provider defined tags and labels

@jpradocueva
Copy link
Contributor

Action Items from the Members' call on December 5:

  • [#540] Chris @cnharris10 : Aggregate poll results and provide a recommendation for addressing the encapsulation bug based on group feedback.
  • [#540] Volunteer: Document examples of current user-defined tag conflicts for inclusion in the resolution proposal.
  • [#540] Irena @ijurica : Draft a proposal outlining the phased approach, ensuring backward compatibility for existing implementations.
  • [#540] All members: Review and comment on the proposed phased approach before the next meeting.

@timwright2000
Copy link

We don't have this problem in GCP as label keys and tag keys must be unique. In the billing data, tags and labels have separate columns or fields. Although, one could have the same key for labels and tags and assigned to the same resource. It would be accurately reflected in the respective columns. GCP also doesn't have a particular set of schema for tags and labels. GCP aslo uses system labels, these are auto generated metadata that applies to certain resources. System labels are also reported in separate columns.

Thanks @daviddinhgcp for the added context. For tags/labels, since keys can be the same and Tags is effectively a Map data type, we'll need a way to differentiate - likely allowing prefixes for user-generated k/v pairs.

For system labels, these are "provider" tags, and the guidance is to add a provider-provided prefix to these k/v pairs to ensure no clobbering

@cnharris10 sorry cant tell if you are saying you think we may still have an issue with the Google tags and labels data for some reason? Perhaps we could have collisions across labels and tags (even though labels have to be unique wrt other labels and tags have to be unique wrt other tags) ? or something else? don't the prefixes take care of this?

@cnharris10
Copy link
Contributor Author

cnharris10 commented Dec 12, 2024

We don't have this problem in GCP as label keys and tag keys must be unique. In the billing data, tags and labels have separate columns or fields. Although, one could have the same key for labels and tags and assigned to the same resource. It would be accurately reflected in the respective columns. GCP also doesn't have a particular set of schema for tags and labels. GCP aslo uses system labels, these are auto generated metadata that applies to certain resources. System labels are also reported in separate columns.

Thanks @daviddinhgcp for the added context. For tags/labels, since keys can be the same and Tags is effectively a Map data type, we'll need a way to differentiate - likely allowing prefixes for user-generated k/v pairs.

For system labels, these are "provider" tags, and the guidance is to add a provider-provided prefix to these k/v pairs to ensure no clobbering

@cnharris10 sorry cant tell if you are saying you think we may still have an issue with the Google tags and labels data for some reason? Perhaps we could have collisions across labels and tags (even though labels have to be unique wrt other labels and tags have to be unique wrt other tags) ? or something else? don't the prefixes take care of this?

@timwright2000 Because tags and labels are user-defined and the current guidance says that user-defined FOCUS tags cannot contain a prefix to namespace collisions, the problem exists within GCP.

@timwright2000
Copy link

@cnharris10 still getting stuck here.... for GCP, user entered labels are unique, user entered tags are unique... may be easier to discuss live...

@jpradocueva
Copy link
Contributor

Action Items from the Maintainers' call on December 16:

  • [#540] Chris @cnharris10 : to review feedback on poll options in TF1, see if we can align on prevailing option; if we get alignment, then we can move forward on PR

@jpradocueva
Copy link
Contributor

Action Items from the TF-1 meeting on December 17:

  • [#540] Chris @cnharris10 : Follow up with Graham to clarify and document the fourth proposed solution.
  • [#540] Tim @timwright2000 : Bring GCP sample data showing its current implementation of tags.
  • [#540] Tim @timwright2000 : to prepare an issue in the backlog indicating an alternative path forward to the current solution.
  • [#540] Group: Discuss poll results and pros/cons of each option in a shared document.
  • [#540] Shawn @shawnalpay : Explore ways to increase engagement on future polls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.2 Agreed scope for release 1.2 backward compatibility Potentially affects compatibility with past FOCUS releases bug in version A bug has been identified in one of the existing versions csp Cloud service providers dimensionality Fields that describe / group / filter metrics spec revision Revise existing definition to be clearer or more accurate work item Issues to be considered for spec development
Projects
Status: Parking Lot
Development

No branches or pull requests