You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At present, we can allow users to upload and deploy many models, and carry out subsequent train, predict and other operations.
However, if users want to manage these models, for example, they want to categorize them based on applicable business scenarios, and quickly query and filter out which models are applicable to a specific business scenario, this may be very difficult or increase the user's effort.
Therefore, we introduce the concept of tag, which allows users to put a specific tag on some models, so that they can better manage models and quickly find the models they need.
And now in ml-commons, there are two system level indexes, model group and model version, on both of which we can allow users to tag, and let both of them reuse tags with each other.
For example, if we add a tag to a model group, then all the model versions in the group can use that tag, and for the same tag that is used, different model versions can each give different tag values or content. In the query, users can first query the value or content of the label, quickly find the model version that they need to use, which is more convenient and faster than directly querying the model, and then through the naked eye to determine whether they need.
Solution
The tag is divided into 3 parts
tag key
tag type (only String and Number two types, Number is float type in java)
tag value
When performing CRUD(Create/Re-query/Update/Delete) operations on the model group, we operate on the key&type. When performing CRUD operations on the model version, we operate on the key&value.
Changes to Index Mapping
Since model group and model version indexes already exist in OS(.plugins-ml-model-group and .plugins-ml-model), we need to modify their mapping definitions in order to adapt them to the new tags field first.
The .plugins-ml-model and .plugins-ml-model-group index data structure of the tags field can be defined as a list, where the elements are nested object of tags(the attributes include key&type in .plugins-ml-model-group and key&value in .plugins-ml-model), or as a map, where the elements use key-value(key-type in .plugins-ml-model-group and key-value in .plugins-ml-model).
If we design tags as a list:
Advantages
Clearer structure, each tag is an independent object, containing all relevant information.
Easy to query, you can directly query the attributes of the tag object.
It is easier to add and delete tags.
Disadvantages
Occupy more storage space.
It is not easy to do aggregation analysis, you need to expand the list first.
If we design tags as a map:
Advantages
Less storage space.
Easier to do aggregation analysis, can be directly based on the key for statistics.
Disadvantages
The structure is not clear enough, the key and value are scattered, the information is not centralized.
Query is relatively complex, you need to get the key first, and then get the value.
Adding and deleting tags requires updating both key and value.
It is recommended to use a list structure to define the tags field. Based on the following reasons:
It won't take up too much extra storage space.
Need to frequently add and delete tags, list operation is more simple.
Directly on the tag object to query more intuitive, centralized information.
If the map object(has many key-value pairs in 1 map object) is too large will cause storage pressure.
Aggregate statistics needs are limited, do not need to choose a complex structure of the map.
The mapping of .plugins-ml-model-group index may be:
And why we defined "value_s" and "value_n" 2 fields ?The explanations are as follows.
Considering the following scenarios:
If user add a tag with a certain type, and this tag existed in history with a different type, when adding tag value to certain model version index, the could be exception since tag value is different with the metadata in index mapping.
After testing, we found that if we define "value_s" and "value_n" in the mapping of .plugins-ml-model index at beginning, this problem will not occur. "value_s" represents the value with tag type String, "value_n" represents the value with tag type Number.
Let's take an example and assume that for .plugins-ml-model index. We define the mapping as follow:
Then let the tags field use list data structure. After experiment, the query match_all, match, term, terms, range can find out the tags we want, so the user modifies the value of the tag, even if he or she changes the type of the value, it will not affect our query to find the desired result. Because even if the type is modified, we need to query the field is not same, when it is changed to String, as long as use "value_s" field to query,changed to Number, as long as use "value_n" field to query.
There is some additional benefits to using these 2 fields, which we describe in detail below:
In the model group scenario, we update the tag generally is to update the type field of the tag, suppose we change from String to Number, if there is no these 2 fields, then wherever this tag used in .plugins-ml-model index, the value of the tag must also be updated, otherwise the value of this tag can not be compatible with the new type Number, will certainly report an error. With these 2 fields, then all value_s field content can still exist. Even after the type is changed to Number, there is no content in the value_n field yet. It's just that subsequent queries can't find the data.
Then we consider the model version scenario, if the value of a tag with type String is changed to Number,for example,the tag value is “abc”,and it is changed to 1.0, we need to check whether the type of this tag is String or Number defined in the .plugins-ml-model-group index first. Even if the value of this tag is changed to Number,has become 1.0, and if we discover the type of this tag is still String, surely the value_s field should be queried. So it will not find the tag updated to 1.0. If the type of this tag has been changed to Number, then it will use the value_n field to query. It definitely will find the tag that become to 1.0, not the tag with the same key whose type is still String.
To summarize, defining these 2 fields, for the subsequent query and update tag in the model group&model version scenarios,these can be compatible with each other, will not cause compatibility errors, and don’t make the query results are not the results we expect to even more.
After redefining the mappings for these 2 indexes, let's look at how in the following scenarios to manipulate tags in the CRUD case of model group and model version, and design the APIs that may need to be added or modified.
Model Group Scenario
Define tag key in the Model Group's API, and perform CRUD operations on the key.
But in fact, we do not need to redesign the query operation for this case, because the current query API already exists, do not need to redefine or modify these, for the model version scenario is also the same, do not need to consider query operation.
Add Tag
In the scenario of registering and modifying a model group, we need to create a new tag.
model group register
register a new group to the model group index to save all the tag information (key) that has been selected in the model group index.
For this RequestBody's content, we have explained in previous sections why is the tags field designed as a list data structure.
model group update
Here we have 2 options, one is to reuse the original model group update API to implement it, and the other is to create a new API to implement it separately.
reuse the original API
When we use the update API for model group, we specify the tag keys to be added in the requestbody in the form of a list.
It is worth noting that, the elements in the tags field here are a list of the latest tag information after users have performed add, delete, and modify tag operations on the UI and are preparing to update the model group. This includes new and modified tags, while tags to be deleted will not be shown in this latest tag list.
The advantages and disadvantages of this approach are as follows:
advantage
* The development effort is relatively small
disadvantage
* It may not know which tags need to be `added`, `modified`, and `deleted` in this model group update.
The latest tags list is also a result of users have performed add, delete, and modify tag operations on the UI.
The advantages and disadvantages of this approach are exactly the opposite of the above.
We tend to prefer the first way(reuse the original API), the reason is also do not need to do much change to the code, as long as the latest tags list information will be updated in this model group.
Update Tag
In the scenario of modifying a model group, we need to update existed tag.
model group update
We have already discussed how to update the tag's type in this scenario when describing the benefits of using the "value_s" and "value_n" fields.
The latest tags list is also a result of users have performed add, delete, and modify tag operations on the UI. Therefore, the tags that do not appear in this latest tags list are the ones we want to delete.
Model Version Scenario
We have already explained in the Model Group Scenario why we don't need to consider query operation, here we continue to explain the scenarios for add, update, delete tags.
Add Tag
In the scenario of registering and modifying a model version, we need to create a new tag.
model version register
register a new model version to the model version index to save all the tag information (key&value) that has been selected in the model group index.
Here are also 2 options, one is to reuse the original model version update API to implement it, and the other is to create a new API to implement it separately.
reuse the original API
But OS does not provide the API to modify the model version. This API is currently under development.
When we use this update API, we specify the tag keys to be added in the requestbody in the form of a list, and if there are no tags to be added, we don't explicitly declare them in the requestbody.
A possible requestbody would look like this:
PUT _plugins/_ml/models/<model_id>/update
requestbody:
{
// other fields
"tags": [
{
"key": "tag1",
"value_s": "abc"
},
{
"key": "tag2",
"value_n": 1.0
}
]
// other fields
}
which of these 2 approach are prefer is same as the first way(reuse the original API). see here
Same as described previously, the tags list in this requestbody is also a result of users have performed add, delete, and modify tag operations on the UI.
Update Tag
In the scenario of modifying a model version, we need to update existed tag.
model version update
As above, this API is still under development, but we can implement tag updates based on this undeveloped API
Same as described previously, the tags list in this requestbody is also a result of users have performed add, delete, and modify tag operations on the UI.
But we need to be aware of the tags whose value has been modified: How to make sure that the value is modified from String to Number or from Number to String in tags that should be an error?
Possible impacts:
We have defined a tag in the model group index with a type of Number, but in the model version index related to this tag, we want to add or update the value of this tag, and if the user inputs the value as a String, it will be unacceptable!
Measures to address negative impacts:
We have already discussed how to update the tag's value in this scenario when describing the benefits of using the "value_s" and "value_n" fields.If we change the value as a String, the "value_s" field will be added a new String value.Otherwise, it just update the original "value_n" field. Subsequent inquiries are not affected.
Delete tag
In the scenario of modifying a model version, we need to delete existed tag.
Same as described previously, the tags list in this requestbody is also a result of users have performed add, delete, and modify tag operations on the UI. Therefore, the tags that do not appear in this latest tags list are the ones we want to delete.
Additional Context?
Not yet, feel free to help us add to the list, thanks!
The text was updated successfully, but these errors were encountered:
Goals
At present, we can allow users to upload and deploy many models, and carry out subsequent train, predict and other operations.
However, if users want to manage these models, for example, they want to categorize them based on applicable business scenarios, and quickly query and filter out which models are applicable to a specific business scenario, this may be very difficult or increase the user's effort.
Therefore, we introduce the concept of tag, which allows users to put a specific tag on some models, so that they can better manage models and quickly find the models they need.
And now in ml-commons, there are two system level indexes, model group and model version, on both of which we can allow users to tag, and let both of them reuse tags with each other.
For example, if we add a tag to a model group, then all the model versions in the group can use that tag, and for the same tag that is used, different model versions can each give different tag values or content. In the query, users can first query the value or content of the label, quickly find the model version that they need to use, which is more convenient and faster than directly querying the model, and then through the naked eye to determine whether they need.
Solution
The tag is divided into 3 parts
When performing CRUD(Create/Re-query/Update/Delete) operations on the model group, we operate on the key&type. When performing CRUD operations on the model version, we operate on the key&value.
Changes to Index Mapping
Since model group and model version indexes already exist in OS(
.plugins-ml-model-group
and.plugins-ml-model
), we need to modify their mapping definitions in order to adapt them to the new tags field first.The
.plugins-ml-model
and.plugins-ml-model-group
index data structure of the tags field can be defined as a list, where the elements are nested object of tags(the attributes include key&type in.plugins-ml-model-group
and key&value in.plugins-ml-model
), or as a map, where the elements use key-value(key-type in.plugins-ml-model-group
and key-value in.plugins-ml-model
).If we design tags as a list:
Advantages
Disadvantages
If we design tags as a map:
Advantages
Disadvantages
It is recommended to use a list structure to define the tags field. Based on the following reasons:
The mapping of
.plugins-ml-model-group
index may be:The mapping of
.plugins-ml-model
index may be:Query the
.plugins-ml-model-group
index, the response may be:Query the
.plugins-ml-model
index, the response may be:And why we defined
"value_s"
and"value_n"
2 fields ?The explanations are as follows.Considering the following scenarios:
If user add a tag with a certain type, and this tag existed in history with a different type, when adding tag value to certain model version index, the could be exception since tag value is different with the metadata in index mapping.
After testing, we found that if we define
"value_s"
and"value_n"
in the mapping of.plugins-ml-model
index at beginning, this problem will not occur."value_s"
represents the value with tag typeString
,"value_n"
represents the value with tag typeNumber
.Let's take an example and assume that for
.plugins-ml-model
index. We define the mapping as follow:After the definition, we add a new document, two new tags, a value is a string type, a value is a number type is perfectly fine, for example
The insertion returns success, after which the search API is called, returning the two tags that have been inserted.
Then let the
tags
field use list data structure. After experiment, the querymatch_all
,match
,term
,terms
,range
can find out the tags we want, so the user modifies the value of the tag, even if he or she changes the type of the value, it will not affect our query to find the desired result. Because even if thetype
is modified, we need to query the field is not same, when it is changed toString
, as long as use"value_s"
field to query,changed toNumber
, as long as use"value_n"
field to query.There is some additional benefits to using these 2 fields, which we describe in detail below:
In the model group scenario, we update the tag generally is to update the
type
field of the tag, suppose we change fromString
toNumber
, if there is no these 2 fields, then wherever this tag used in.plugins-ml-model
index, thevalue
of the tag must also be updated, otherwise thevalue
of this tag can not be compatible with the new typeNumber
, will certainly report an error. With these 2 fields, then allvalue_s
field content can still exist. Even after the type is changed toNumber
, there is no content in thevalue_n
field yet. It's just that subsequent queries can't find the data.Then we consider the model version scenario, if the value of a tag with type
String
is changed toNumber
,for example,the tagvalue
is “abc”,and it is changed to 1.0, we need to check whether thetype
of this tag isString
orNumber
defined in the.plugins-ml-model-group
index first. Even if thevalue
of this tag is changed toNumber
,has become 1.0, and if we discover thetype
of this tag is stillString
, surely thevalue_s
field should be queried. So it will not find the tag updated to 1.0. If thetype
of this tag has been changed toNumber
, then it will use thevalue_n
field to query. It definitely will find the tag that become to 1.0, not the tag with the same key whose type is stillString
.To summarize, defining these 2 fields, for the subsequent query and update tag in the model group&model version scenarios,these can be compatible with each other, will not cause compatibility errors, and don’t make the query results are not the results we expect to even more.
After redefining the mappings for these 2 indexes, let's look at how in the following scenarios to manipulate tags in the CRUD case of model group and model version, and design the APIs that may need to be added or modified.
Model Group Scenario
Define tag key in the Model Group's API, and perform CRUD operations on the key.
But in fact, we do not need to redesign the query operation for this case, because the current query API already exists, do not need to redefine or modify these, for the model version scenario is also the same, do not need to consider query operation.
Add Tag
In the scenario of registering and modifying a model group, we need to create a new tag.
model group register
register a new group to the model group index to save all the tag information (key) that has been selected in the model group index.
See https://opensearch.org/docs/latest/ml-commons-plugin/model-access-control#registering-a-model-group
The content of
RequestBody
is:For this
RequestBody
's content, we have explained in previous sections why is thetags
field designed as a list data structure.model group update
Here we have 2 options, one is to reuse the original model group update API to implement it, and the other is to create a new API to implement it separately.
When we use the update API for model group, we specify the tag keys to be added in the
requestbody
in the form of a list.The update API for model group can be seen https://opensearch.org/docs/latest/ml-commons-plugin/model-access-control#updating-a-model-group
A possible
requestbody
would look like this:It is worth noting that, the elements in the
tags
field here are a list of the latest tag information after users have performedadd
,delete
, andmodify
tag operations on the UI and are preparing to update the model group. This includes new and modified tags, while tags to be deleted will not be shown in this latest tag list.The advantages and disadvantages of this approach are as follows:
advantage
disadvantage
the
url
andrequestbody
may be as below:The latest tags list is also a result of users have performed
The advantages and disadvantages of this approach are exactly the opposite of the above.add
,delete
, andmodify
tag operations on the UI.We tend to prefer the first way(reuse the original API), the reason is also do not need to do much change to the code, as long as the latest tags list information will be updated in this model group.
Update Tag
In the scenario of modifying a model group, we need to update existed tag.
model group update
We have already discussed how to update the tag's
type
in this scenario when describing the benefits of using the"value_s"
and"value_n"
fields.A possible
requestbody
would look like this:The latest tags list is also a result of users have performed
add
,delete
, andmodify
tag operations on the UI.Delete tag
we are able to delete existed tag in the scenario of modifying a model group.
model group update
A possible
requestbody
would look like this:The latest tags list is also a result of users have performed
add
,delete
, andmodify
tag operations on the UI. Therefore, the tags that do not appear in this latest tags list are the ones we want to delete.Model Version Scenario
We have already explained in the Model Group Scenario why we don't need to consider query operation, here we continue to explain the scenarios for add, update, delete tags.
Add Tag
In the scenario of registering and modifying a model version, we need to create a new tag.
model version register
register a new model version to the model version index to save all the tag information (key&value) that has been selected in the model group index.
See https://opensearch.org/docs/latest/ml-commons-plugin/api/#registering-a-model
The content of
RequestBody
is:model version update
Here are also 2 options, one is to reuse the original model version update API to implement it, and the other is to create a new API to implement it separately.
But OS does not provide the API to modify the model version. This API is currently under development.
When we use this update API, we specify the tag keys to be added in the
requestbody
in the form of a list, and if there are no tags to be added, we don't explicitly declare them in therequestbody
.A possible
requestbody
would look like this:the
url
andrequestbody
may be as below:which of these 2 approach are prefer is same as the first way(reuse the original API). see here
Same as described previously, the tags list in this
requestbody
is also a result of users have performedadd
,delete
, andmodify
tag operations on the UI.Update Tag
In the scenario of modifying a model version, we need to update existed tag.
model version update
As above, this API is still under development, but we can implement tag updates based on this undeveloped API
A possible
requestbody
would look like this:Same as described previously, the tags list in this
requestbody
is also a result of users have performedadd
,delete
, andmodify
tag operations on the UI.But we need to be aware of the tags whose value has been modified: How to make sure that the value is modified from String to Number or from Number to String in tags that should be an error?
Possible impacts:
We have defined a tag in the model group index with a type of
Number
, but in the model version index related to this tag, we want to add or update the value of this tag, and if the user inputs the value as aString
, it will be unacceptable!Measures to address negative impacts:
We have already discussed how to update the tag's
value
in this scenario when describing the benefits of using the"value_s"
and"value_n"
fields.If we change the value as aString
, the"value_s"
field will be added a newString
value.Otherwise, it just update the original"value_n"
field. Subsequent inquiries are not affected.Delete tag
In the scenario of modifying a model version, we need to delete existed tag.
model version update
A possible
requestbody
would look like this:Same as described previously, the tags list in this
requestbody
is also a result of users have performedadd
,delete
, andmodify
tag operations on the UI. Therefore, the tags that do not appear in this latest tags list are the ones we want to delete.Additional Context?
Not yet, feel free to help us add to the list, thanks!
The text was updated successfully, but these errors were encountered: