-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] add _cat/ml/trained_models API #51529
[ML] add _cat/ml/trained_models API #51529
Conversation
Pinging @elastic/ml-core (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. I'd like a 3rd opinion on what fields are returned and the cat params from a DFAer.
Why not return all trained models instead of just DFA models, then mark which come from a DFA. This could be done with tags although that would be a problem for existing DFA models that don't have this tag. Maybe do the opposite and tag user generated models then filter by that tag?
Additionally, conditionally knowing what data frame analytics job created the model
Yes that would be nice
} | ||
GetTrainedModelsStatsAction.Request statsRequest = new GetTrainedModelsStatsAction.Request(modelId); | ||
GetTrainedModelsAction.Request modelsAction = new GetTrainedModelsAction.Request(modelId, false, null); | ||
if (restRequest.hasParam(PageParams.FROM.getPreferredName()) || restRequest.hasParam(PageParams.SIZE.getPreferredName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only cat action that supports paging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, and it probably should as there is a limit of 10K when reading configs from an index.
Our _cat
APIs are the only ones that return data that are stored in indices (i think)
] | ||
}, | ||
"params":{ | ||
"allow_no_match":{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cat indices does not have the allow_no_indices
option. Is this conventional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches up with our GET <resource>
pattern.
I could remove the option and make it always true.
Set<String> potentialAnalyticsIds = new HashSet<>(); | ||
// Analytics Configs are created by the XPackUser | ||
trainedModelConfigs.stream() | ||
.filter(c -> XPackUser.NAME.equals(c.getCreatedBy())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want a better way of differentiating user models and DFA models in the future. Maybe a reserved tag for DFA models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidkyle possibly. Users cannot set the created_by
, and XPackUser.NAME
is a reserved name.
But I see your point for models we provide as a resource. Those weren't created by a DFA.
|
||
client.execute(GetTrainedModelsStatsAction.INSTANCE, | ||
statsRequest, | ||
ActionListener.wrap(groupedListener::onResponse, groupedListener::onFailure)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the wrap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, generic types get upset if there is no wrapper.
Map<String, String> analyticsMap = analyticsConfigs.stream() | ||
.map(DataFrameAnalyticsConfig::getId) | ||
.collect(Collectors.toMap(Function.identity(), Function.identity())); | ||
logger.warn("ANALYTICS MAP " + analyticsMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over debug?
🤔 interesting, like a |
I was meaning another review from someone who's worked on the DFA code, but what your suggesting also sounds good |
The |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this mean |
It is a way to help users measure model "complexity". It is an estimation of the number of arithmetic operations to use the model in inference. Having memory + arithmetic operations allows users to make decisions around "simple" vs "complex" models. |
Thanks for the reply @benwtrent ! I originally assumed that you were using the number of operations as a proxy for "model complexity" (as in when we say that a neural network has a higher complexity than a linear model), which is where the confusion seemed to arise. But Valeriy has clarified that you are indeed using the operations number to estimate computational complexity not informational complexity. |
This adds _cat/ml/trained_models.
This adds
_cat/ml/trained_models
.Certain pieces of data are in the config that do not exist in the stats response. Additionally, conditionally knowing what data frame analytics job created the model (if the job still exists) would be nice information.
Examples:
The tricky code here is finding dataframe analytics configs that match up with the trained models. If folks want to get 1000s of trained models in this call, and each one has 10+ unique tags, it could be that our dataframe analysis query is too large. I think it is good to throw in that situation (which is done automatically if the paging params are out of bounds). Folks can filter down this request with trained model ids and paging params.
closes #51414