[ML] add _cat/ml/trained_models API #51529

benwtrent · 2020-01-28T13:02:48Z

This adds _cat/ml/trained_models.

Certain pieces of data are in the config that do not exist in the stats response. Additionally, conditionally knowing what data frame analytics job created the model (if the job still exists) would be nice information.

Examples:

# GET _cat/ml/trained_models?v
id                           heap_size operations ingest.pipelines
ddddd-1580216177138          3.5mb     196        0
flight-regress-1580215685537 1.7mb     102        0
lang_ident_model_1           1mb       39629      0

# GET _cat/ml/trained_models?h=*&v
id                           created_by heap_size operations license  create_time              version description                                                    data_frame_analytics_id ingest.pipelines ingest.count ingest.time ingest.current ingest.failed
ddddd-1580216177138              _xpack 3.5mb     196        PLATINUM 2020-01-28T12:56:17.138Z 8.0.0                                                                  ddddd                   0                0            0s          0              0
flight-regress-1580215685537     _xpack 1.7mb     102        PLATINUM 2020-01-28T12:48:05.537Z 8.0.0                                                                  flight-regress          0                0            0s          0              0
lang_ident_model_1               _xpack 1mb       39629      BASIC    2019-12-05T12:28:34.594Z 7.6.0   Model used for identifying language from arbitrary input text. __none__                0                0            0s          0              0

# GET _cat/ml/trained_models?help
id                      |                       | the trained model id                                                          
created_by              | c,createdBy           | who created the model                                                         
heap_size               | hs,modelHeapSize      | the estimated heap size to keep the model in memory                           
operations              | o,modelOperations     | the estimated number of operations to use the model                           
license                 | l                     | The license level of the model                                                
create_time             | ct                    | The time the model was created                                                
version                 | v                     | The version of Elasticsearch when the model was created                       
description             | d                     | The model description                                                         
data_frame_analytics_id | df,dataFrameAnalytics | The data frame analytics config id that created the model (if still available)
ingest.pipelines        | ip,ingestPipelines    | The number of pipelines referencing the model                                 
ingest.count            | ic,ingestCount        | The total number of docs processed by the model                               
ingest.time             | it,ingestTime         | The total time spent processing docs with this model                          
ingest.current          | icurr,ingestCurrent   | The total documents currently being handled by the model                      
ingest.failed           | if,ingestFailed       | The total count of failed ingest attempts with this model

The tricky code here is finding dataframe analytics configs that match up with the trained models. If folks want to get 1000s of trained models in this call, and each one has 10+ unique tags, it could be that our dataframe analysis query is too large. I think it is good to throw in that situation (which is done automatically if the paging params are out of bounds). Folks can filter down this request with trained model ids and paging params.

closes #51414

elasticmachine · 2020-01-28T13:02:51Z

Pinging @elastic/ml-core (:ml)

davidkyle

Code LGTM. I'd like a 3rd opinion on what fields are returned and the cat params from a DFAer.

Why not return all trained models instead of just DFA models, then mark which come from a DFA. This could be done with tags although that would be a problem for existing DFA models that don't have this tag. Maybe do the opposite and tag user generated models then filter by that tag?

Additionally, conditionally knowing what data frame analytics job created the model
Yes that would be nice

davidkyle · 2020-01-31T11:20:10Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/cat/RestCatTrainedModelsAction.java

+        }
+        GetTrainedModelsStatsAction.Request statsRequest = new GetTrainedModelsStatsAction.Request(modelId);
+        GetTrainedModelsAction.Request modelsAction = new GetTrainedModelsAction.Request(modelId, false, null);
+        if (restRequest.hasParam(PageParams.FROM.getPreferredName()) || restRequest.hasParam(PageParams.SIZE.getPreferredName())) {


This is the only cat action that supports paging

Correct, and it probably should as there is a limit of 10K when reading configs from an index.
Our _cat APIs are the only ones that return data that are stored in indices (i think)

davidkyle · 2020-01-31T11:22:25Z

x-pack/plugin/src/test/resources/rest-api-spec/api/cat.ml.trained_models.json

+      ]
+    },
+    "params":{
+      "allow_no_match":{


cat indices does not have the allow_no_indices option. Is this conventional?

This matches up with our GET <resource> pattern.

I could remove the option and make it always true.

davidkyle · 2020-01-31T11:25:55Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/cat/RestCatTrainedModelsAction.java

+                    Set<String> potentialAnalyticsIds = new HashSet<>();
+                    // Analytics Configs are created by the XPackUser
+                    trainedModelConfigs.stream()
+                        .filter(c -> XPackUser.NAME.equals(c.getCreatedBy()))


I think we want a better way of differentiating user models and DFA models in the future. Maybe a reserved tag for DFA models

@davidkyle possibly. Users cannot set the created_by, and XPackUser.NAME is a reserved name.

But I see your point for models we provide as a resource. Those weren't created by a DFA.

davidkyle · 2020-01-31T11:39:49Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/cat/RestCatTrainedModelsAction.java

+
+                    client.execute(GetTrainedModelsStatsAction.INSTANCE,
+                        statsRequest,
+                        ActionListener.wrap(groupedListener::onResponse, groupedListener::onFailure));


Do you need the wrap?

Yes, generic types get upset if there is no wrapper.

davidkyle · 2020-01-31T11:44:50Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/cat/RestCatTrainedModelsAction.java

+        Map<String, String> analyticsMap = analyticsConfigs.stream()
+            .map(DataFrameAnalyticsConfig::getId)
+            .collect(Collectors.toMap(Function.identity(), Function.identity()));
+        logger.warn("ANALYTICS MAP  " + analyticsMap);


left over debug?

benwtrent · 2020-01-31T12:05:59Z

I'd like a 3rd opinion on what fields are returned and the cat params from a DFAer.

🤔 interesting, like a data_frame prefixed set of options. I will see what I can do

davidkyle · 2020-01-31T12:16:44Z

I'd like a 3rd opinion on what fields are returned and the cat params from a DFAer.

🤔 interesting, like a data_frame prefixed set of options. I will see what I can do

I was meaning another review from someone who's worked on the DFA code, but what your suggesting also sounds good

benwtrent · 2020-01-31T12:32:41Z

Why not return all trained models instead of just DFA models, then mark which come from a DFA.

The _cat API does that. Those that don't have a DFA are flagged as such (dataframe id is __none__).

benwtrent · 2020-01-31T14:59:44Z

@elasticmachine update branch

davidkyle

LGTM

Winterflower · 2020-02-04T12:11:35Z

What does this mean the estimated number of operations to use the model ?
Sorry super n00b question, but what operations are we talking about here? The number of times a model has been "called" by a job?

benwtrent · 2020-02-04T12:22:20Z

What does this mean the estimated number of operations to use the model?
Sorry super n00b question, but what operations are we talking about here? The number of times a model has been "called" by a job?

It is a way to help users measure model "complexity". It is an estimation of the number of arithmetic operations to use the model in inference.

Having memory + arithmetic operations allows users to make decisions around "simple" vs "complex" models.

@Winterflower

…rainedmodels-api

Winterflower · 2020-02-05T10:10:45Z

What does this mean the estimated number of operations to use the model?
Sorry super n00b question, but what operations are we talking about here? The number of times a model has been "called" by a job?

It is a way to help users measure model "complexity". It is an estimation of the number of arithmetic operations to use the model in inference.

Having memory + arithmetic operations allows users to make decisions around "simple" vs "complex" models.

@Winterflower

Thanks for the reply @benwtrent ! I originally assumed that you were using the number of operations as a proxy for "model complexity" (as in when we say that a neural network has a higher complexity than a linear model), which is where the confusion seemed to arise. But Valeriy has clarified that you are indeed using the operations number to estimate computational complexity not informational complexity.

This adds _cat/ml/trained_models.

* [ML] add _cat/ml/trained_models API (#51529) This adds _cat/ml/trained_models.

[ML] add _cat/ml/trained_models API

250a114

benwtrent added >enhancement :ml Machine learning v8.0.0 v7.7.0 labels Jan 28, 2020

davidkyle reviewed Jan 31, 2020

View reviewed changes

benwtrent requested a review from dimitris-athanasiou January 31, 2020 12:43

benwtrent added 3 commits January 31, 2020 08:01

adding more dfa columns

4a93c6d

fixing yaml test

5a63dfd

removing unused imports

b833f8f

Merge branch 'master' into feature/ml-_cat-trainedmodels-api

33175e8

benwtrent requested a review from davidkyle January 31, 2020 20:38

davidkyle approved these changes Feb 3, 2020

View reviewed changes

benwtrent removed the request for review from dimitris-athanasiou February 4, 2020 12:01

benwtrent added 2 commits February 4, 2020 08:37

adjusting default columns

75cf793

Merge remote-tracking branch 'upstream/master' into feature/ml-_cat-t…

7936ae3

…rainedmodels-api

benwtrent merged commit 374eca7 into elastic:master Feb 5, 2020

benwtrent deleted the feature/ml-_cat-trainedmodels-api branch February 5, 2020 12:09

benwtrent mentioned this pull request Feb 5, 2020

[7.x] [ML] add _cat/ml/trained_models API (#51529) #51936

Merged

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Feb 5, 2020

[ML] add _cat/ml/trained_models API (elastic#51529)

d2ef48f

This adds _cat/ml/trained_models.

benwtrent added a commit that referenced this pull request Feb 5, 2020

[7.x] [ML] add _cat/ml/trained_models API (#51529) (#51936)

79f1439

* [ML] add _cat/ml/trained_models API (#51529) This adds _cat/ml/trained_models.

szabosteve mentioned this pull request Feb 27, 2020

[DOCS] Adds cat trained model API documentation #52824

Merged

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] add _cat/ml/trained_models API #51529

[ML] add _cat/ml/trained_models API #51529

benwtrent commented Jan 28, 2020 •

edited

Loading

elasticmachine commented Jan 28, 2020

davidkyle left a comment

davidkyle Jan 31, 2020

benwtrent Jan 31, 2020

davidkyle Jan 31, 2020

benwtrent Jan 31, 2020

davidkyle Jan 31, 2020

benwtrent Jan 31, 2020

davidkyle Jan 31, 2020

benwtrent Jan 31, 2020

davidkyle Jan 31, 2020

benwtrent commented Jan 31, 2020

davidkyle commented Jan 31, 2020

benwtrent commented Jan 31, 2020

benwtrent commented Jan 31, 2020

davidkyle left a comment

Winterflower commented Feb 4, 2020

benwtrent commented Feb 4, 2020 •

edited

Loading

Winterflower commented Feb 5, 2020

[ML] add _cat/ml/trained_models API #51529

[ML] add _cat/ml/trained_models API #51529

Conversation

benwtrent commented Jan 28, 2020 • edited Loading

elasticmachine commented Jan 28, 2020

davidkyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent commented Jan 31, 2020

davidkyle commented Jan 31, 2020

benwtrent commented Jan 31, 2020

benwtrent commented Jan 31, 2020

davidkyle left a comment

Choose a reason for hiding this comment

Winterflower commented Feb 4, 2020

benwtrent commented Feb 4, 2020 • edited Loading

Winterflower commented Feb 5, 2020

benwtrent commented Jan 28, 2020 •

edited

Loading

benwtrent commented Feb 4, 2020 •

edited

Loading