Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal]: Flyte System Tags and metadata #3320
[Proposal]: Flyte System Tags and metadata #3320
Changes from 2 commits
5742671
bc087ed
a5f6886
cfdba4f
1d049e3
dd27c18
dbb14fd
fca779e
ba8efc9
b90586f
9eb5e22
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably call the arg here
--tag
to not confuse this with kubernetes labels.And then probably assign the tags to k8s annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point, but the rpc field is sadly already called label. and these will become k8s labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this approach is chosen, I wonder whether it would be nicer for the user to do
instead of
This doesn't mean that under the hood the labels mechanism couldn't be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat opposed to this as it could be confusing to users as to what is a label vs what is a keyword cli argument 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do we wanna create
group
andexperiment
as CLI arguments and introduce them as a concept? 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer option 1: treat all labels the same way. Users might not want to follow the categories we deem sensible. Experiment tracking servers like Mlflow or Wandb, which also have such a tagging mechanism, simply allow users to assign arbitrary tags. I would argue that ML engineers are used to this and we should provide the same UX without imposing special naming conventions.
Only exception: execution name
I find it really helpful to have the pod names include customizable identifiers.
We have a registration script, similar to
pyflyte run
with has an--execution_name
arg. The user provided value is appended with a random uuid, as is currently already chosen for the execution ids, and the result is checked against the execution name regex again and then passed toFlyteRemote.execute(execution_name=...)
(already supported, see here). So I wouldn't treat execution name with a pod label but the podsmetadata.name
.This comment is another argument for not treating execution names with labels but instead
metadata.name
since I agree that tags need to be mutable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstadlbauer @fg91 @elibixby @flixr @goyalankit Some questions for you
Currently, execution spec (with labels) is serialized to byte and is stored in the execution table. it's impossible to add / delete / update tags. if we use k8s client to filtered flyteworkflow (CRD) by labels. we cannot search a execution after CR is deleted.
I have a PR that adds tags table. it allows us easily add / update / delete tags, and even attach tags to task / workflow / project. however, it's not key-value pair tags for now. If we decide to use key-value pair tags, I just need to add a new column to the tags table and update the query. I'd like to know your thought first.
btw, the current implementation works with both Mysql and Postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My $0.02 is let's keep it simple and support what you call key-only tags.
A person can 'hack' this to resemble key-value if needed (ie 'costcenter-12'), but we don't need to manage that complexity on the back end or in the UI when we get to figuring out how to let folks use tags to sort/group things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion key-only tags are perfectly fine and what ML engineers are used to from experiment tracking servers
I think being able to add/delete/update tags after the execution has already started or ended is an important feature. User story: an experiment is training/trained really well and I want to mark it for later. This is something that is not known when starting the execution. But updating/deleting/adding tags when the execution is already running would mean that the k8s labels are not in sync with what is stored in the tags table. I'd therefore say that I wouldn't apply the tags as labels to k8s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response here but agreed with what's been said above. Key only tags would also solve all our usecases 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are users allowed to change the labels? If they are then overriding might be an issue since you might have already fired async events to external systems. So I think it might be useful to maintain executionID as an identifier that can't be modified once execution has been created.
Alternatively, this could be an alias to the execution ID rather than overriding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executionID cannot be changed - it is immutable and unique per project/domain.
name is just an alias. I will update the doc to reflect this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but i do like the idea of immutable labels as well. once added you cannot change them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we'll support both mutable and immutable labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.