[ML] Improve response format of data frame stats endpoint #44350

droberts195 · 2019-07-15T14:31:03Z

This change adjusts the data frame transforms stats
endpoint to return a structure that is easier to
understand.

This is a breaking change for clients of the data frame
transforms stats endpoint, but the feature is in beta so
stability is not guaranteed.

Closes #43767

This change adjusts the data frame stats endpoint to return the format discussed in elastic#43767. Relates elastic#43767

elasticmachine · 2019-07-15T14:31:06Z

Pinging @elastic/ml-core

This still doesn't work as the checkpointing info is always empty in the state-and-stats docs. This commit makes that clearer by not actually persisting empty checkpointing info in the state-and-stats docs, but instead hardcoding this at the point of use. The next step is then to get non-empty checkpointing info when required...

DataFrameTransformStateAndStatsInfo -> DataFrameTransformStats (user facing) DataFrameTransformStateAndStats -> DataFrameTransformStoredDoc (internal)

Note: YAML tests are still TODO

Now that indexer_state has moved into checkpointing.next it is unreliable to assert on the value of indexer_state in a YAML test because the tests are so quick that the next checkpoint can easily move to last before any assertion is made.

Also renamed some more internal variables to match the user facing stats output

benwtrent · 2019-07-19T13:52:19Z

...st/java/org/elasticsearch/client/dataframe/transforms/hlrc/DataFrameTransformStatsTests.java

+            randomFrom(DataFrameTransformTaskState.values()),
+            randomBoolean() ? null : randomAlphaOfLength(100),
+            randomBoolean() ? null : randomNodeAttributes(),
+            // On the server side the stats has transform ID "_all" when embedded in another doc


Couldn't this internal ID field be skipped while serializing the XContent?

It is skipped when serializing. The problem is that this test creates a server-side object, round-trips that through the HLRC back to another server-side object, then asserts equality. After the round-tripping via serialization that discards the ID it ends up as _all. Hence the original object has to be created with an ID of _all so that it is considered equal.

You're also correct that the cleanest solution with the way the rest of the code is today is just not to store a transform ID in the indexer stats. At present when it's persisted to an index it's always inside a state-and-stats wrapper. There is an idea to split the storage so that state and stats are persisted separately. Obviously if that were done then we'd need an ID in the stats during persistence and this problem would come back. But maybe it is best to leave the state and stats wrapped in a single document when they're persisted. Grouping them may avoid consistency problems where the stats endpoint response includes old state and new stats (or vice-versa) for a particular transform due to the two docs becoming searchable at different times. So in fact keeping state and stats grouped during persistence can solve two problems:

Guarantees consistency of state and indexer stats incorporated into any single stats endpoint response

Allows the complex/confusing code related to the sometimes-present ID in indexer stats to be deleted

Any thoughts @hendrikmuhs and @davidkyle?

+1

Our discussions convinced me that we should stay with storing state and stats together. As explained above this avoids consistency problems and in addition we do not have to break backwards compatibility. The deletion of superfluous fields should not break BWC.

benwtrent · 2019-07-19T13:56:36Z

...n/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameIndexerTransformStats.java

@@ -133,7 +132,6 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
                throw new IllegalArgumentException("when storing transform statistics, a valid transform id must be provided");
            }
            builder.field(DataFrameField.ID.getPreferredName(), transformId);


Do we even need to serialize this ID anylonger? Or could we have some sort of parameter that is passed by the containing stats object?

this is a good point, the purpose of the ID was to delete corresponding documents after the deletion of a transform (via dbq). As stats is not stored as such the ID is as superfluous as the doc type, so the whole internal storage handling is obsolete.

(We can also clean up in a later PR if preferred, this is already getting complex)

I will do this in a followup PR. It will also mean the assertion that IDs are identical on merge will have to go, but I guess that was only a debugging aid - it didn't do anything in production.

...ava/org/elasticsearch/xpack/dataframe/action/TransportGetDataFrameTransformsStatsAction.java

benwtrent · 2019-07-19T14:04:54Z

...ava/org/elasticsearch/xpack/dataframe/action/TransportGetDataFrameTransformsStatsAction.java

+
+    private void populateSingleStoppedTransformStat(DataFrameTransformStoredDoc transform,
+                                                    ActionListener<DataFrameTransformCheckpointingInfo> listener) {
+        transformsCheckpointService.getCheckpointStats(transform.getId(), transform.getTransformState().getCheckpoint(),


I think we are going to find that this will hurt performance in the future. I know refactoring getCheckpointStats would a monstrous undertaking, and probably will have to be done in another PR.

Yes, true, I should probably write a new method that gets all the required checkpoint docs for all the stopped transforms in one step.

Also getting the "operations behind" for many transforms that share the same source indices is inefficient because the indices stats action sends messages to the nodes holding the shards for each index that's involved and we'd do that multiple times. So a more efficient approach would be to get the indices stats for the union of all indices referenced in the source of all transforms we're going to return stats for, then pick the relevant bits out of that for each transform.

The approach in this PR at the moment is actually what's suggested in #42978 but further optimisation would certainly be possible. Since this PR is targeted at 7.4 we have time to do this in a followup. Otherwise like you say the title of this PR will have to be changed to "refactor everything".

benwtrent · 2019-07-19T14:07:22Z

...ava/org/elasticsearch/xpack/dataframe/action/TransportGetDataFrameTransformsStatsAction.java

+        AtomicInteger numberRemaining = new AtomicInteger(statsForTransformsWithoutTasks.size());
+        AtomicBoolean isExceptionReported = new AtomicBoolean(false);
+
+        statsForTransformsWithoutTasks.forEach(stat -> populateSingleStoppedTransformStat(stat,


How does the cluster behave with a larger amount of transforms getting stats at the same time? This could be pretty bad if the UI is loading many transforms at the same time to display their stats and most of them are stopped.

benwtrent · 2019-07-19T14:08:40Z

.../java/org/elasticsearch/xpack/dataframe/checkpoint/DataFrameTransformsCheckpointService.java

+     * @param nextCheckpoint the next checkpoint
+     * @param nextCheckpointIndexerState indexer state for the next checkpoint
+     * @param nextCheckpointPosition position for the next checkpoint
+     * @param nextCheckpointProgress progress for the next checkpoint


yo dog, I heard you like nextCheckpoint :D

benwtrent

hendrikmuhs

LGTM

Huge Thanks for this!

I added some comments, e.g. there might be some follow ups but should not stop us here

hendrikmuhs · 2019-07-22T12:13:50Z

...st/java/org/elasticsearch/client/dataframe/transforms/hlrc/DataFrameTransformStatsTests.java

+            randomBoolean() ? null : randomNodeAttributes(),
+            // On the server side the stats has transform ID "_all" when embedded in another doc
+            // TODO: change this so that the outer document sets the correct transform ID during parsing
+            // It's very confusing and could cause subtle errors that the inner object has a surprising ID


just for the record: the id in stats is a refactoring leftover, it was forgotten to be removed, stats was stored as individual document before state and stats have been combined, see discussion above. So the todo should be "remove the id"

hendrikmuhs · 2019-07-22T12:22:39Z

...n/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameIndexerTransformStats.java

@@ -133,7 +132,6 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
                throw new IllegalArgumentException("when storing transform statistics, a valid transform id must be provided");
            }
            builder.field(DataFrameField.ID.getPreferredName(), transformId);


this is a good point, the purpose of the ID was to delete corresponding documents after the deletion of a transform (via dbq). As stats is not stored as such the ID is as superfluous as the doc type, so the whole internal storage handling is obsolete.

(We can also clean up in a later PR if preferred, this is already getting complex)

hendrikmuhs · 2019-07-22T12:27:32Z

...src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameTransformStats.java

+/**
+ * Used as a wrapper for the objects returned from the stats endpoint.
+ * Objects of this class are expected to be ephemeral.
+ * Do not persist objects of this class to cluster state or an index.


hendrikmuhs · 2019-07-22T12:45:04Z

...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java

@@ -661,9 +660,8 @@ protected void doSaveState(IndexerState indexerState, DataFrameIndexerPosition p

            // Persisting stats when we call `doSaveState` should be ok as we only call it on a state transition and


nit: this comment looks outdated, quick suggestion:

"Persist the current state and stats in the internal index. The interval of this method being called is controlled by AsyncTwoPhaseIndexer#onBulkResponse which calls doSaveState every-so-often when doing bulk indexing calls or at the end of one indexing run"

Mutes data frame BWC tests prior to backporting elastic#44350

Mutes data frame BWC tests prior to backporting #44350

This change adjusts the data frame transforms stats endpoint to return a structure that is easier to understand. This is a breaking change for clients of the data frame transforms stats endpoint, but the feature is in beta so stability is not guaranteed. Backport of #44350

This is a followup to elastic#44350. The indexer stats used to be persisted standalone, but now are only persisted as part of a state-and-stats document. During the review of elastic#44350 it was decided that we'll stick with this design, so there will never be a need for an indexer stats object to store its transform ID as it is stored on the enclosing document. This PR removes the indexer stats document ID.

This change adjusts the changes of #44350 to account for the backport to the 7.x branch in #44743.

This is a followup to #44350. The indexer stats used to be persisted standalone, but now are only persisted as part of a state-and-stats document. During the review of #44350 it was decided that we'll stick with this design, so there will never be a need for an indexer stats object to store its transform ID as it is stored on the enclosing document. This PR removes the indexer stats document ID.

Mutes data frame BWC tests prior to backporting elastic#44350

This is a followup to #44350. The indexer stats used to be persisted standalone, but now are only persisted as part of a state-and-stats document. During the review of #44350 it was decided that we'll stick with this design, so there will never be a need for an indexer stats object to store its transform ID as it is stored on the enclosing document. This PR removes the indexer stats document ID. Backport of #44768

…cs (#46821) The PRs that made these changes are: - #44350 - #45276 - #45856 Co-Authored-By: István Zoltán Szabó <[email protected]> Co-Authored-By: Lisa Cawley <[email protected]>

The PRs that made these changes are: - elastic#44350 - elastic#45276 - elastic#45856 Co-Authored-By: István Zoltán Szabó <[email protected]> Co-Authored-By: Lisa Cawley <[email protected]> Backport of elastic#46821

…cs (#47034) The PRs that made these changes are: - #44350 - #45276 - #45856 Co-Authored-By: István Zoltán Szabó <[email protected]> Co-Authored-By: Lisa Cawley <[email protected]> Backport of #46821

The PRs that made these changes are: - elastic/elasticsearch#44350 - elastic/elasticsearch#45276 - elastic/elasticsearch#45856

[ML] Change response format of data frame stats endpoint

6ad765a

This change adjusts the data frame stats endpoint to return the format discussed in elastic#43767. Relates elastic#43767

droberts195 added the :ml/Transform Transform label Jul 15, 2019

droberts195 requested a review from hendrikmuhs July 15, 2019 14:31

droberts195 added 8 commits July 15, 2019 15:51

Fixing some HLRC unit tests

ba3fa68

Merge branch 'master' into change_stats_format

8edd858

Adjusting docs

711e1f0

Merge branch 'master' into change_stats_format

cc945e0

Renaming

83c3033

DataFrameTransformStateAndStatsInfo -> DataFrameTransformStats (user facing) DataFrameTransformStateAndStats -> DataFrameTransformStoredDoc (internal)

Get Java-based single/multi-node integration tests working

c6e35f2

Note: YAML tests are still TODO

Merge branch 'master' into change_stats_format

78e0392

droberts195 mentioned this pull request Jul 18, 2019

[ML] Data frame GET _stats response is confusing #43767

Closed

droberts195 added 2 commits July 18, 2019 17:25

Fix data frame YAML tests

9b96fca

Now that indexer_state has moved into checkpointing.next it is unreliable to assert on the value of indexer_state in a YAML test because the tests are so quick that the next checkpoint can easily move to last before any assertion is made.

More places where we can't assert on indexer state now

b83f44f

droberts195 mentioned this pull request Jul 19, 2019

[ML] GET _transform only returns 100 #43052

Closed

Add progress object back to next checkpoint stats

3426afb

Also renamed some more internal variables to match the user facing stats output

droberts195 marked this pull request as ready for review July 19, 2019 13:01

droberts195 added >enhancement v7.4.0 v8.0.0 labels Jul 19, 2019

droberts195 changed the title ~~[ML] Change response format of data frame stats endpoint~~ [ML] Improve response format of data frame stats endpoint Jul 19, 2019

droberts195 added >breaking >breaking-java labels Jul 19, 2019

benwtrent reviewed Jul 19, 2019

View reviewed changes

droberts195 added 2 commits July 19, 2019 16:16

Address one review comment, more naming consistency and remove warnings

d61ca6a

Merge branch 'master' into change_stats_format

cdc39c2

benwtrent approved these changes Jul 22, 2019

View reviewed changes

hendrikmuhs approved these changes Jul 22, 2019

View reviewed changes

Merge branch 'master' into change_stats_format

8cfbfbf

droberts195 deleted the change_stats_format branch July 23, 2019 09:48

This was referenced Jul 23, 2019

[ML][Data Frame] enable bwc tests again, adjusting after backport #44720

Merged

[ML] Improve response format of data frame stats endpoint #44743

Merged

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Jul 23, 2019

Muting tests for backport

cb324dd

Mutes data frame BWC tests prior to backporting elastic#44350

droberts195 mentioned this pull request Jul 23, 2019

[ML-DataFrame] Muting tests for backport #44749

Merged

droberts195 added a commit that referenced this pull request Jul 23, 2019

[ML-DataFrame] Muting tests for backport (#44749)

6818a62

Mutes data frame BWC tests prior to backporting #44350

droberts195 mentioned this pull request Jul 23, 2019

[ML-DataFrame] Adjust data frame stats BWC following backport #44760

Merged

droberts195 mentioned this pull request Jul 23, 2019

[ML-DataFrame] Remove ID field from data frame indexer stats #44768

Merged

droberts195 added a commit that referenced this pull request Jul 23, 2019

[ML-DataFrame] Adjust data frame stats BWC following backport (#44760)

37b354e

This change adjusts the changes of #44350 to account for the backport to the 7.x branch in #44743.

droberts195 mentioned this pull request Jul 25, 2019

[ML-DataFrame] Remove ID field from data frame indexer stats #44848

Merged

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Jul 25, 2019

[ML-DataFrame] Muting tests for backport (elastic#44749)

94a1960

Mutes data frame BWC tests prior to backporting elastic#44350

walterra mentioned this pull request Jul 29, 2019

[ML] Data Frames: Update stats data structure. elastic/kibana#42117

Merged

4 tasks

droberts195 mentioned this pull request Sep 18, 2019

[DOCS] Add 7.4 breaking changes for transforms and data frame analytics #46821

Merged

droberts195 mentioned this pull request Sep 24, 2019

[DOCS] Add 7.4 breaking changes for transforms and data frame analytics #47034

Merged

codebrain mentioned this pull request Oct 14, 2019

7.4 meta ticket elastic/elasticsearch-net#4133

Closed

56 tasks

jakelandis mentioned this pull request Feb 22, 2021

DRAFT [META] REST Compatible API V7 completeness #68905

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Improve response format of data frame stats endpoint #44350

[ML] Improve response format of data frame stats endpoint #44350

droberts195 commented Jul 15, 2019 •

edited

Loading

elasticmachine commented Jul 15, 2019

benwtrent Jul 19, 2019

droberts195 Jul 19, 2019

hendrikmuhs Jul 22, 2019

benwtrent Jul 19, 2019

hendrikmuhs Jul 22, 2019

droberts195 Jul 22, 2019

benwtrent Jul 19, 2019

droberts195 Jul 19, 2019

benwtrent Jul 19, 2019

benwtrent Jul 19, 2019

benwtrent left a comment

hendrikmuhs left a comment

hendrikmuhs Jul 22, 2019

hendrikmuhs Jul 22, 2019

hendrikmuhs Jul 22, 2019

hendrikmuhs Jul 22, 2019

		@@ -661,9 +660,8 @@ protected void doSaveState(IndexerState indexerState, DataFrameIndexerPosition p

		// Persisting stats when we call `doSaveState` should be ok as we only call it on a state transition and

[ML] Improve response format of data frame stats endpoint #44350

[ML] Improve response format of data frame stats endpoint #44350

Conversation

droberts195 commented Jul 15, 2019 • edited Loading

elasticmachine commented Jul 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent left a comment

Choose a reason for hiding this comment

hendrikmuhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droberts195 commented Jul 15, 2019 •

edited

Loading