Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML][Data Frame] make response.count be total count of hits #43241

Merged

Conversation

benwtrent
Copy link
Member

Even though we support pagination for certain resources, there is no information returned to the end user to tell them that they need to look in the next page of resources.

This PR adjusts the the count returned in the QueryPage type objects, and in DataFrameStats to be the totalhits of the query ran. This will show the user how many total documents match their simple ID patterned query.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@@ -153,16 +155,33 @@ public Response(List<DataFrameTransformStateAndStats> transformsStateAndStats, L
public Response(StreamInput in) throws IOException {
super(in);
transformsStateAndStats = in.readList(DataFrameTransformStateAndStats::new);
if (in.getVersion().onOrAfter(Version.CURRENT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work when you don't provide a concrete version (e.g. 7.3.0) but "CURRENT"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@przemekwitek CURRENT is whatever the latest version is in that branch. Honestly, I probably could make it 7.3.0 as there are no BWC tests for Dataframe transforms yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@przemekwitek my personal workflow when doing things that could break BWC is:

  • Set the serialization code to refer to CURRENT for the change
  • Merge that PR
  • Prepare for backport with appropriate version branch and change CURRENT to that version
  • Mute the appropriate BWC tests in Master
  • Merge the backport
  • Unmute and change from CURRENT to real version in Master PR.

The reason I do it this way is so the code gets exercised through the BWC tests throughout the process. If I muted BWC in this PR, I could potentially miss other BWC test failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation Ben!

@@ -174,15 +193,15 @@ public void readFrom(StreamInput in) {
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
toXContentCommon(builder, params);
builder.field(DataFrameField.COUNT.getPreferredName(), transformsStateAndStats.size());
builder.field(DataFrameField.COUNT.getPreferredName(), totalCount == 0 ? transformsStateAndStats.size() : totalCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can "0" be a valid value of totalCount? If so, totalCount should likely be nullable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will always be the case that totalCount <= transformsStateAndStats.size().

I see what your concern, but given how this class is used, there are no use cases where totalCount is < transformsStateAndStats.size(). I will try to make this more obvious and safe in the class.

expandedIds -> {
request.setExpandedIds(new HashSet<>(expandedIds));
request.setNodes(DataFrameNodes.dataFrameTaskNodes(expandedIds, clusterService.state()));
expandedIdsAndHits -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/expandedIdsAndHits/hitsAndExpandedIds/
to make variable name indicate the order of tuple elements?

@benwtrent
Copy link
Member Author

run elasticsearch-ci/1

@@ -235,7 +240,7 @@ public void expandTransformIds(String transformIdsExpression, PageParams pagePar
requiredMatches.unmatchedIdsString())));
return;
}
foundIdsListener.onResponse(ids);
foundIdsListener.onResponse(new Tuple<>(totalHits, ids));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] Alternatively you could use Tuple.tuple factory method instead of a constructor here.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit 838d4c4 into elastic:master Jun 19, 2019
@benwtrent benwtrent deleted the feature/ml-df-make-count-total-doc-count branch June 19, 2019 18:37
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jun 19, 2019
…43241)

* [ML][Data Frame] make response.count be total count of hits

* addressing line length check

* changing response count for filters

* adjusting serialization, variable name, and total count logic

* making count mandatory for creation
benwtrent added a commit that referenced this pull request Jun 19, 2019
…43389)

* [ML][Data Frame] make response.count be total count of hits

* addressing line length check

* changing response count for filters

* adjusting serialization, variable name, and total count logic

* making count mandatory for creation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants