Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Super randomized tests for fetch fields API #70278

Merged
merged 24 commits into from
Mar 24, 2021

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Mar 10, 2021

We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that fields works similarly
to docvalues_fields. We expect this to be true for most fields.

It does so by forcing all subclasses of MapperTestCase to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.

We skip the test for a few field types that don't have doc values:

  • annotated_text
  • completion
  • search_as_you_type
    We should come up with some way to test these without doc values, even
    if it isn't as nice. But that is a problem for another time, I think.

For text fields we use fielddata which works, but its not a great
test becuase we intentionally don't mimick docvalue_fields in
fields. So we have to pick our input data in a funny way. It's better
than nothing.

We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.

I've filed a few bugs for things that are inconsistent with
docvalues_fields. Typically that means that we have to limit the
random values that we generate to those that do round trip properly.

We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that `fields` works similarly
to `docvalues_fields`. We expect this to be true for most fields.

It does so by forcing all subclasses of `MapperTestCase` to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.

We skip the test for a few field types that don't have doc values:
* `annotated_text`
* `completion`
* `search_as_you_type`
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.

For `text` fields we use `fielddata` which *works*, but its not a great
test becuase we intentionally don't mimick `docvalue_fields` in
`fields`. So we have to pick our input data in a funny way. It's better
than nothing.

We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.

I've filed a few bugs for things that are inconsistent with
`docvalues_fields`. Typically that means that we have to limit the
random values that we generate to those that *do* round trip properly.
@@ -136,4 +137,10 @@ public void testRejectMultiValuedFields() throws MapperParsingException, IOExcep
assertEquals("[rank_feature] fields do not support indexing multiple values for the same field [foo.field] in the same document",
e.getCause().getMessage());
}

@Override
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose a Supplier here because we have a test that tests many values at once and I wanted to be able to build many values that have a similar "theme". Check out the tests for dates for an example of where this is useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this more? I looked at the date tests but couldn't tell where having a 'supplier' was critical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rereading this I don't think it really is critical. It was when was sorting the values and deduping them, but it isn't really critical any more. It is a bit more realistic to generate similarly themed values though. But we don't really need it any more. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to making this just an Object


@Override
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) {
assumeFalse("Test implemented in a follow up", true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because I wanted to be able to open the PR and for everything to compile. I do expect to handle this in a follow up.

@nik9000 nik9000 marked this pull request as ready for review March 15, 2021 16:23
@nik9000 nik9000 added :Search Foundations/Mapping Index mappings, including merging and defining field types >test Issues or PRs that are addressing/adding tests v7.13.0 v8.0.0 labels Mar 15, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Mar 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@nik9000
Copy link
Member Author

nik9000 commented Mar 15, 2021

I've removed draft. The folks I talked to were comfortable with this as an incremental approach. It might not be where we land but its a whole lot of tests we don't have now and found a few fun things already.

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea. Eventually, I'd like to replace the logic in AggregatorTestCase#writeTestDoc to use these suppliers.

return Map.of(randomBoolean() ? "gt" : "gte", lhs, randomBoolean() ? "lt" : "lte", rhs);
};
}
throw new UnsupportedOperationException("unsupported field type: " + ft);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave a TODO comment for IP and Date ranges?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. I forgot that one. Yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, maybe it's cleaner to avoid adding this initial logic until we remove the assumeFalse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to come up with some way to make sure we don't throw it away for when we're ready for it. I can revert it from the PR though because I can use the revert commit in the PR to revive it.

@nik9000
Copy link
Member Author

nik9000 commented Mar 15, 2021

@elasticmachine, update branch

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some (mostly small) comments. I really like the approach overall!

One high-level thought: these tests actually test more than field fetching, they also verify document parsing and doc values loading. I don't have a great naming idea, maybe testParseAndFetchValues?

@@ -1079,4 +1081,18 @@ public void testSimpleMerge() throws IOException {
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(TextFieldMapper.class));
assertThat(mapperService.documentMapper().mappers().getMapper("other_field"), instanceOf(KeywordFieldMapper.class));
}

@Override
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it doesn't feel worth testing random fetch for text fields. It's not really accurate to compare the fetch output to field data, since field data is based on analyzed tokens. Plus, the fetching/ parsing logic is straightforward and randomized tests might not bring much insight.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think its worth it either because doc values are so different. Do you think I should just assumeFalse on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep assumeFalse here makes sense to me.

return Map.of(randomBoolean() ? "gt" : "gte", lhs, randomBoolean() ? "lt" : "lte", rhs);
};
}
throw new UnsupportedOperationException("unsupported field type: " + ft);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, maybe it's cleaner to avoid adding this initial logic until we remove the assumeFalse.

* fetcher doesn't.
*/
// https://github.com/elastic/elasticsearch/issues/70263
// if (randomBoolean()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to avoid commented-out logic, maybe we can remove it while we resolve the above issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++. I'll drop it and link it to the issue.

* and the native fetchers usually don't sort. We're ok with this
* difference. But we have to convince the test we're ok with it.
*/
assertThat(fromNative, containsInAnyOrder(fromDocValues.stream().map(Matchers::equalTo).collect(toList())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how this check differs from the below?

assertThat(fromNative, containsInAnyOrder(fromDocValues));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yours is just better. I was scanning the list of methods in Matchers and didn't see yours. I haven't a clue how I missed it.

* and {@link #testFetchMany}. The returned objects must be supported by
* {@link XContentBuilder#value(Object)}
*/
protected abstract Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a name like randomFetchTestValueSupplier would be clearer, I haven't seen the 'vendor' terminology used elsewhere. (I could also see this name being generalized in the future to randomValueSupplier, as we add more tests that take advantage of random value generation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first wrote it I had named it randomValueSupplier but I figured it wasn't right to call it that if we were going to use it for other things. I can certainly rename it if you think we will. I'm happy to replace Vendor with Supplier too. Both work for me. I'm not sure why I went with vendor there. maybe I just felt like being different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about generateRandomInputValue? Or we could reuse getSampleValueForDocument and add some randomization there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to go with generateRandomInputValue and we can cut getSampleValueForDocument over to it in a follow up. I'd like to contain all of the "fun" randomization failures that show up from that little change to their own PR if you'll allow it.

return createMapperService(mapping(b -> b.startObject("field").field("type", "keyword").endObject()));
@Override
protected Supplier<Comparable<?>> randomFetchTestValueVendor(MappedFieldType ft) {
return () -> randomAlphaOfLengthBetween(1, 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we generate a broader range of valid values, like numbers and booleans?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess numbers and booleans do get converted. We should indeed generate them.

}

protected Supplier<Number> randomFetchValueVendor(NumericType nt) {
switch (nt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to exercise 'coercion' here too: numbers can be specified as strings, and we also allow passing float point values to integral types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it found #70585

try {
MappedFieldType ft = mapperService.fieldType("field");
/*
* When we have many values doc values will sort and unique them.
Copy link
Contributor

@jtibshirani jtibshirani Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment out-of-date? It doesn't seem we sort or deduplicate values here. This got me thinking, should we deduplicate fetched values in assertFetch before comparing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is out of date.

It looks to me like we shouldn't deduplicate because doc values doesn't deduplicate. Unless I've messed up my test somehow.

MappedFieldType ft = mapperService.fieldType("field");
assertFetch(mapperService, "field", randomFetchTestValueVendor(ft).get(), randomFetchTestFormat());
} finally {
assertFetchWarnings();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could remove this method assertFetchWarnings() and just call assertParseMinimalWarnings here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd work for now. Maybe not later. But if we need it then we can add it then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -136,4 +137,10 @@ public void testRejectMultiValuedFields() throws MapperParsingException, IOExcep
assertEquals("[rank_feature] fields do not support indexing multiple values for the same field [foo.field] in the same document",
e.getCause().getMessage());
}

@Override
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this more? I looked at the date tests but couldn't tell where having a 'supplier' was critical.

@romseygeek
Copy link
Contributor

I'm still unsure about the tests on this one, but I think that's because I'm unsure about what we want to guarantee is returned in fields. Given no format, do we want to return:

  • the value that was added in _source - ie, precisely what would have been returned if you asked for _source filtered down to just this field
  • the value that would have been used if this field was searched as part of the original query
  • a value that can be lifted out and used in subsequent queries or aggregations

Each of these options is perfectly reasonable, and each one requires different guarantees - this PR is basically testing for the second option here, but if we instead want to go for the first or third then we'll need different tests. In particular, the fact that you can apply a format implies that fields is mainly about display, and so option 1 seems reasonable.

@nik9000
Copy link
Member Author

nik9000 commented Mar 18, 2021

@jtibshirani and @romseygeek I think this is ready for another round.

In response to @romseygeek's last comment - for the most part we're trying to be consistent with how we parse the field in the first place and to apply the format. This mostly means we line up with docvalues_fields. We allow ourselves to deviate from it if we feel like we have a good reason, but our "default" is to behave the same way.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely getting closer! I left a couple more questions.

@@ -136,4 +137,10 @@ public void testRejectMultiValuedFields() throws MapperParsingException, IOExcep
assertEquals("[rank_feature] fields do not support indexing multiple values for the same field [foo.field] in the same document",
e.getCause().getMessage());
}

@Override
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to making this just an Object


@Override
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) {
switch (between(0, 3)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will definitely be nicer when it doesn't return a Supplier as then we get a random mix of types within a single doc.

* and {@link #testFetchMany}. The returned objects must be supported by
* {@link XContentBuilder#value(Object)}
*/
protected abstract Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about generateRandomInputValue? Or we could reuse getSampleValueForDocument and add some randomization there?

@nik9000
Copy link
Member Author

nik9000 commented Mar 22, 2021

OK folks! Time for another round. I've remove the Supplier. I don't like the kinds of values it makes as much but I can live with it.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - one nit and one query but no need for another go round from me.

@Override
protected final Object generateRandomInputValue(MappedFieldType ft) {
Number n = randomNumber();
return randomBoolean() ? n : n.toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to run into #69382 and the issues with scientific format? Or does that only apply to date parsing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numbers are fine parsing scientific notation. It's dates that hate 'em. There is an issue with buried in the somewhere but I've awaitsFixed that bit and opened another issue.

}

/**
* Create a vendor for similar random values to round trip in {@link #testFetch}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Generates random values to round trip in...`

@nik9000 nik9000 merged commit 91c700b into elastic:master Mar 24, 2021
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 24, 2021
We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that `fields` works similarly
to `docvalues_fields`. We expect this to be true for most fields.

It does so by forcing all subclasses of `MapperTestCase` to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.

We skip the test for a few field types that don't have doc values:
* `annotated_text`
* `completion`
* `search_as_you_type`
* `text`
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.

We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.

I've filed a few bugs for things that are inconsistent with
`docvalues_fields`. Typically that means that we have to limit the
random values that we generate to those that *do* round trip properly.
nik9000 added a commit that referenced this pull request Mar 24, 2021
)

We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that `fields` works similarly
to `docvalues_fields`. We expect this to be true for most fields.

It does so by forcing all subclasses of `MapperTestCase` to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.

We skip the test for a few field types that don't have doc values:
* `annotated_text`
* `completion`
* `search_as_you_type`
* `text`
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.

We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.

I've filed a few bugs for things that are inconsistent with
`docvalues_fields`. Typically that means that we have to limit the
random values that we generate to those that *do* round trip properly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team >test Issues or PRs that are addressing/adding tests v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants