[Data Frame] Refactor PUT transform to not create a task #39934

benwtrent · 2019-03-11T18:38:51Z

Refactor PUT transforms such that:

POST _start creates the task and starts it
GET transforms queries docs instead of tasks
POST _stop verifies the stored config exists before trying to stop
the task

* POST _start creates the task and starts it * GET transforms queries docs instead of tasks * POST _stop verifies the stored config exists before trying to stop the task

elasticmachine · 2019-03-11T18:38:53Z

Pinging @elastic/ml-core

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/dataframe/DataFrameMessages.java

hendrikmuhs

Nice! Added some comments

hendrikmuhs · 2019-03-12T07:25:56Z

...node-tests/src/test/java/org/elasticsearch/xpack/dataframe/integration/DataFrameUsageIT.java

@@ -47,6 +47,7 @@ public void testUsage() throws IOException {

        // create a transform
        createPivotReviewsTransform("test_usage", "pivot_reviews", null);
+        startAndWaitForTransform("test_usage", "pivot_reviews");


is this required now? If so, it looks like a bug to me, usage should report on the number of transforms created whether they are started or not. With other words: this test should pass without this line.

@hendrikmuhs, I will have to update the usage API with this change then. I will do that and remove this line.

Actually, @hendrikmuhs, looking at DataFrameFeatureSet#usage, it calls the stats endpoint and until stats are stored in a doc, we need the task to exist. I will mark this with a TODO to signify such.

You mean to get this?

assertEquals(0, XContentMapValues.extractValue("data_frame.stats.index_failures", usageAsMap));

Agreed, this requires the state and I think it's fine to call startAndWaitForTransform in order to get it.

However this test:

assertEquals(1, XContentMapValues.extractValue("data_frame.transforms._all", usageAsMap));

should work.

What about:

createPivotReviewsTransform("test_usage", "pivot_reviews", null); usageResponse = client().performRequest(new Request("GET", "_xpack/usage")); usageAsMap = entityAsMap(usageResponse); // we should see the job assertEquals(1, XContentMapValues.extractValue("data_frame.transforms._all", usageAsMap)); // but no stats assertEquals(null, XContentMapValues.extractValue("data_frame.stats", usageAsMap)); // start it startAndWaitForTransform("test_usage", "pivot_reviews"); // get usage again usageResponse = client().performRequest(new Request("GET", "_xpack/usage")); usageAsMap = entityAsMap(usageResponse); // we should see some stats assertEquals(1, XContentMapValues.extractValue("data_frame.transforms._all", usageAsMap)); assertEquals(0, XContentMapValues.extractValue("data_frame.stats.index_failures", usageAsMap)); assertEquals(???, XContentMapValues.extractValue("data_frame.stats.documents_indexed", usageAsMap));

@hendrikmuhs , what makes this difficult is that there is no way to get stats at all outside of the allocated persistent task as it is stored in Indexer.getStats(). Consequently, if the process crashed and had to move to another node, all previous stats would be lost.

I will refactor this the usage endpoint, but it will immediately have to be refactored again when we store stats in something outside of the running state of the allocated task on the node.

Additionally, I am not sure you want to get all non-running transforms...a hearkening back to your worries of 10k+ configurations.

👍 and now the fun begins! Excellent find! This is called regularly if usage collection is enabled.

If I remember correctly this summarizes the overall transforms and transforms per state. Makes sense to me, I think we should keep doing this. But now that we store everything in an index we can re-factor (this is pre-index code).

Anyway, can you put this into an issue? No need to solve this as part of this PR.

Additionally I think this isn't only a problem here, other parts of the code count in a similar fashion.

@hendrikmuhs definitely, I will open an issue. I think if we move stats to an index, aggregations will solve this for us, there may be some wonkiness around transforms that are yet to have a task created (do we create a blank stats document when the transform is created?).

hendrikmuhs · 2019-03-12T07:36:21Z

...main/java/org/elasticsearch/xpack/dataframe/action/TransportPutDataFrameTransformAction.java

+
+            client.execute(HasPrivilegesAction.INSTANCE, privRequest, privResponseListener);
+        } else {
+            putDataFrame(config, listener);


nit: reading this is a bit counter-intuitive as I wondered, hey the other case doesn't call putDataFrame, I now see it is wrapped. Maybe add a comment here, e.g.: "no need to check access rights, go straight to creation"

...in/java/org/elasticsearch/xpack/dataframe/action/TransportStartDataFrameTransformAction.java

...ain/java/org/elasticsearch/xpack/dataframe/action/TransportStopDataFrameTransformAction.java

...ain/java/org/elasticsearch/xpack/dataframe/persistence/DataFrameTransformsConfigManager.java

hendrikmuhs · 2019-03-12T08:15:40Z

...ain/java/org/elasticsearch/xpack/dataframe/persistence/DataFrameTransformsConfigManager.java

+                searchResponse -> {
+                    List<DataFrameTransformConfig> configs = new ArrayList<>(searchResponse.getHits().getHits().length);
+                    for (SearchHit hit : searchResponse.getHits().getHits()) {
+                        DataFrameTransformConfig config = parseTransformLenientlyFromSourceSync(hit.getSourceRef(),


I am worried: what's the size of 10000 config objects? I would not be worried about the XContent, but we parse all of those and create objects including nested objects like AggregationBuilders, QueryBuilders, etc. In addition: for GET we parse every query but do not use it.

What about having different methods? In the end we have 2 usecases:

getting exactly 1 config for the purpose of using it (indexer)

getting the configuration of 1 or more config for the purpose of returning the XContent of it (GET), if possible avoid object creations but at least do not keep 10k objects in a list

Making this change would make data frame transforms different to every other endpoint in the code base.

Remember also that in the transport client the GET endpoint needs to return valid objects, not XContent.

If 10000 is too big then the solution that fits with the way everything else works would be to implement pagination and restrict the maximum page size to something less than 10000. But I think that should be done in a different PR (if at all).

I am adding pagination to this endpoint (and stats) eventually and am planning on doing that in a separate PR.

@droberts195 @benwtrent I am unsure if you understood my point. This wasn't about endpoints but about the methods in this class. Imagine we really have 10k transforms, we are creating a list of 10k objects, those objects have inner objects. These 10k "rich objects" are temporary objects as you call toXContent() on them at the end. Only the string representation is needed, the parsed objects are just side artifacts.

It's correct that the response object has to support wire serialization, however the inner implementation of the response object is completely up to us. There is no need to store a list of DataFrameTransformConfig objects. The requirements of a response object can be fulfilled without storing these "rich objects"[*].

Pagination doesn't solve the problem as long as we keep the size of a single page at 10k (see line 142).

[*] Having that said, I simply do not see a need to return 10k configs at a time. I do not even see the need to solve the >10k transforms usecase in 7.1.

I am fine with "fixing this in an upcoming PR" - but I strongly disagree doing it this way. At line 142 we explicitly set the limit to 10k and as explained above I do not think the implementation is ready to support 10k due to the high memory usage it requires. We should set a lower limit there for the time being and we can "fix this in an upcoming PR". We should not put the system at risk for this! Progress over perfection but in a safe way.

Pagination in combination with a smaller page size sounds like a good alternative, again getting 10k configs at a time, I do not see a need for it.

@hendrikmuhs when searching and getting a result, we will indicate the size and page parameters given (after another refactor to move common code in to xpack.core, see #39976). These paging params will default to size 100 and start: 0, respectively.

Addendum, I am against passing around strings and maps more than necessary. This sacrifices one of the securities of having a statically typed language. I think worrying about this type of serialization overhead is pre-optimization at the cost of compile time guarantees.

I am fine lowering the default in this PR for now, but I think chances of this bringing down a cluster due to somebody creating 10k transforms between us merging this PR and then putting in sensible paging parameters in a follow up PR are infinitesimal (7.1 is not even released yet.).

I am fine if that gets addressed in a PR that hits 7.1, sure. When the discussion started it wasn't clear what "follow up PR" means (also given that the problem existed prior this PR).

Pagination with a sensible limit per page sounds great! No need to further discuss about optimizing the size of the list once we have that.

We need to step away from the idea of returning > 10K results. I think the way forward is to allow searching (i.e. implement the AbstractTransportGetResourcesAction). Then if a user hits > 10K results, we should error. The user can then change the query in order to reduce the matching resources. Anything else is not scalable.

hendrikmuhs · 2019-03-12T08:20:21Z

...ain/java/org/elasticsearch/xpack/dataframe/action/TransportGetDataFrameTransformsAction.java

-        final DiscoveryNodes nodes = state.nodes();
-
-        if (nodes.isLocalNodeElectedMaster()) {
-            if (DataFramePersistentTaskUtils.stateHasDataFrameTransforms(request.getId(), state)) {


did DataFramePersistentTaskUtils became dead code? If so, can you remove it?

@hendrikmuhs, not yet. Once we have stats in an index instead of cluster state, the stats endpoint can be refactored to not use DataFramePersistentTaskUtils.stateHasDataFrameTransforms any longer. At which point, it will be dead code and can be exorcised from the code base.

droberts195 · 2019-03-12T09:58:53Z

...ain/java/org/elasticsearch/xpack/dataframe/action/TransportGetDataFrameTransformsAction.java

-                        new ActionListenerResponseHandler<>(listener, Response::new));
-            }
-        }
+        //TODO support comma delimited and simple regex IDs


I am hitting a need for this too for getting sets of data frame analytics jobs. So it would be good if you could add a reusable class for doing this into the X-Pack core library.

droberts195

LGTM

I'm happy to merge this and tidy up the loose ends in follow up PRs.

…ansformConfigurations

* [Data Frame] Refactor PUT transform such that: * POST _start creates the task and starts it * GET transforms queries docs instead of tasks * POST _stop verifies the stored config exists before trying to stop the task * Addressing PR comments * Refactoring DataFrameFeatureSet#usage, decreasing size returned getTransformConfigurations * fixing failing usage test

…0010) * [Data Frame] Refactor PUT transform such that: * POST _start creates the task and starts it * GET transforms queries docs instead of tasks * POST _stop verifies the stored config exists before trying to stop the task * Addressing PR comments * Refactoring DataFrameFeatureSet#usage, decreasing size returned getTransformConfigurations * fixing failing usage test

[Data Frame] Refactor PUT transform such that:

dffcb2a

* POST _start creates the task and starts it * GET transforms queries docs instead of tasks * POST _stop verifies the stored config exists before trying to stop the task

benwtrent added >refactoring :ml Machine learning v8.0.0 v7.2.0 labels Mar 11, 2019

benwtrent changed the title ~~[Data Frame] Refactor PUT transform such that:~~ [Data Frame] Refactor PUT transform to not create a task Mar 11, 2019

hendrikmuhs reviewed Mar 12, 2019

View reviewed changes

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/dataframe/DataFrameMessages.java Outdated Show resolved Hide resolved

hendrikmuhs reviewed Mar 12, 2019

View reviewed changes

droberts195 reviewed Mar 12, 2019

View reviewed changes

Addressing PR comments

6c8b903

droberts195 approved these changes Mar 12, 2019

View reviewed changes

benwtrent added 2 commits March 12, 2019 16:38

Refactoring DataFrameFeatureSet#usage, decreasing size returned getTr…

8c3c8db

…ansformConfigurations

fixing failing usage test

e365f2b

benwtrent merged commit 56f3038 into elastic:master Mar 13, 2019

benwtrent deleted the feature/data-frame-create-task-on-start branch March 13, 2019 19:36

davidkyle mentioned this pull request Mar 14, 2019

[ML-DataFrame] GET Transform with a unknown ID does not return a 404 #40003

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data Frame] Refactor PUT transform to not create a task #39934

[Data Frame] Refactor PUT transform to not create a task #39934

benwtrent commented Mar 11, 2019 •

edited

Loading

elasticmachine commented Mar 11, 2019

hendrikmuhs left a comment

hendrikmuhs Mar 12, 2019

benwtrent Mar 12, 2019

benwtrent Mar 12, 2019

hendrikmuhs Mar 12, 2019

benwtrent Mar 12, 2019

benwtrent Mar 12, 2019

hendrikmuhs Mar 12, 2019

benwtrent Mar 13, 2019

hendrikmuhs Mar 12, 2019

hendrikmuhs Mar 12, 2019

droberts195 Mar 12, 2019

benwtrent Mar 12, 2019

hendrikmuhs Mar 12, 2019

benwtrent Mar 12, 2019

hendrikmuhs Mar 12, 2019

dimitris-athanasiou Mar 13, 2019 •

edited

Loading

hendrikmuhs Mar 12, 2019

benwtrent Mar 12, 2019 •

edited

Loading

droberts195 Mar 12, 2019

droberts195 left a comment

[Data Frame] Refactor PUT transform to not create a task #39934

[Data Frame] Refactor PUT transform to not create a task #39934

Conversation

benwtrent commented Mar 11, 2019 • edited Loading

elasticmachine commented Mar 11, 2019

hendrikmuhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimitris-athanasiou Mar 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent Mar 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

benwtrent commented Mar 11, 2019 •

edited

Loading

dimitris-athanasiou Mar 13, 2019 •

edited

Loading

benwtrent Mar 12, 2019 •

edited

Loading