-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Super randomized tests for fetch fields API #70278
Conversation
We've had a few bugs in the fields API where is doesn't behave like we'd expect. Typically this happens because it isn't obvious what we expct. So we'll try and use randomized testing to ferret out what we want. This adds a test for most field types that asserts that `fields` works similarly to `docvalues_fields`. We expect this to be true for most fields. It does so by forcing all subclasses of `MapperTestCase` to define a method that makes random values. It declares a few other hooks that subclasses can override to further randomize the test. We skip the test for a few field types that don't have doc values: * `annotated_text` * `completion` * `search_as_you_type` We should come up with some way to test these without doc values, even if it isn't as nice. But that is a problem for another time, I think. For `text` fields we use `fielddata` which *works*, but its not a great test becuase we intentionally don't mimick `docvalue_fields` in `fields`. So we have to pick our input data in a funny way. It's better than nothing. We skip the test for a few more types just because I wanted to cut this PR in half so we could get to reviewing it earlier. We'll get to those in a follow up change. I've filed a few bugs for things that are inconsistent with `docvalues_fields`. Typically that means that we have to limit the random values that we generate to those that *do* round trip properly.
@@ -136,4 +137,10 @@ public void testRejectMultiValuedFields() throws MapperParsingException, IOExcep | |||
assertEquals("[rank_feature] fields do not support indexing multiple values for the same field [foo.field] in the same document", | |||
e.getCause().getMessage()); | |||
} | |||
|
|||
@Override | |||
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose a Supplier
here because we have a test that tests many values at once and I wanted to be able to build many values that have a similar "theme". Check out the tests for dates for an example of where this is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this more? I looked at the date tests but couldn't tell where having a 'supplier' was critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rereading this I don't think it really is critical. It was when was sorting the values and deduping them, but it isn't really critical any more. It is a bit more realistic to generate similarly themed values though. But we don't really need it any more. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to making this just an Object
|
||
@Override | ||
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) { | ||
assumeFalse("Test implemented in a follow up", true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because I wanted to be able to open the PR and for everything to compile. I do expect to handle this in a follow up.
Pinging @elastic/es-search (Team:Search) |
I've removed draft. The folks I talked to were comfortable with this as an incremental approach. It might not be where we land but its a whole lot of tests we don't have now and found a few fun things already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this idea. Eventually, I'd like to replace the logic in AggregatorTestCase#writeTestDoc
to use these suppliers.
return Map.of(randomBoolean() ? "gt" : "gte", lhs, randomBoolean() ? "lt" : "lte", rhs); | ||
}; | ||
} | ||
throw new UnsupportedOperationException("unsupported field type: " + ft); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe leave a TODO comment for IP and Date ranges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes. I forgot that one. Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, maybe it's cleaner to avoid adding this initial logic until we remove the assumeFalse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to come up with some way to make sure we don't throw it away for when we're ready for it. I can revert it from the PR though because I can use the revert commit in the PR to revive it.
@elasticmachine, update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some (mostly small) comments. I really like the approach overall!
One high-level thought: these tests actually test more than field fetching, they also verify document parsing and doc values loading. I don't have a great naming idea, maybe testParseAndFetchValues
?
@@ -1079,4 +1081,18 @@ public void testSimpleMerge() throws IOException { | |||
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(TextFieldMapper.class)); | |||
assertThat(mapperService.documentMapper().mappers().getMapper("other_field"), instanceOf(KeywordFieldMapper.class)); | |||
} | |||
|
|||
@Override | |||
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it doesn't feel worth testing random fetch for text fields. It's not really accurate to compare the fetch output to field data, since field data is based on analyzed tokens. Plus, the fetching/ parsing logic is straightforward and randomized tests might not bring much insight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think its worth it either because doc values are so different. Do you think I should just assumeFalse
on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep assumeFalse
here makes sense to me.
return Map.of(randomBoolean() ? "gt" : "gte", lhs, randomBoolean() ? "lt" : "lte", rhs); | ||
}; | ||
} | ||
throw new UnsupportedOperationException("unsupported field type: " + ft); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, maybe it's cleaner to avoid adding this initial logic until we remove the assumeFalse
.
* fetcher doesn't. | ||
*/ | ||
// https://github.com/elastic/elasticsearch/issues/70263 | ||
// if (randomBoolean()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to avoid commented-out logic, maybe we can remove it while we resolve the above issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++. I'll drop it and link it to the issue.
* and the native fetchers usually don't sort. We're ok with this | ||
* difference. But we have to convince the test we're ok with it. | ||
*/ | ||
assertThat(fromNative, containsInAnyOrder(fromDocValues.stream().map(Matchers::equalTo).collect(toList()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain how this check differs from the below?
assertThat(fromNative, containsInAnyOrder(fromDocValues));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yours is just better. I was scanning the list of methods in Matchers
and didn't see yours. I haven't a clue how I missed it.
* and {@link #testFetchMany}. The returned objects must be supported by | ||
* {@link XContentBuilder#value(Object)} | ||
*/ | ||
protected abstract Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a name like randomFetchTestValueSupplier
would be clearer, I haven't seen the 'vendor' terminology used elsewhere. (I could also see this name being generalized in the future to randomValueSupplier
, as we add more tests that take advantage of random value generation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first wrote it I had named it randomValueSupplier
but I figured it wasn't right to call it that if we were going to use it for other things. I can certainly rename it if you think we will. I'm happy to replace Vendor
with Supplier
too. Both work for me. I'm not sure why I went with vendor there. maybe I just felt like being different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about generateRandomInputValue
? Or we could reuse getSampleValueForDocument
and add some randomization there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to go with generateRandomInputValue
and we can cut getSampleValueForDocument
over to it in a follow up. I'd like to contain all of the "fun" randomization failures that show up from that little change to their own PR if you'll allow it.
return createMapperService(mapping(b -> b.startObject("field").field("type", "keyword").endObject())); | ||
@Override | ||
protected Supplier<Comparable<?>> randomFetchTestValueVendor(MappedFieldType ft) { | ||
return () -> randomAlphaOfLengthBetween(1, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we generate a broader range of valid values, like numbers and booleans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess numbers and booleans do get converted. We should indeed generate them.
} | ||
|
||
protected Supplier<Number> randomFetchValueVendor(NumericType nt) { | ||
switch (nt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to exercise 'coercion' here too: numbers can be specified as strings, and we also allow passing float point values to integral types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it found #70585
try { | ||
MappedFieldType ft = mapperService.fieldType("field"); | ||
/* | ||
* When we have many values doc values will sort and unique them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment out-of-date? It doesn't seem we sort or deduplicate values here. This got me thinking, should we deduplicate fetched values in assertFetch
before comparing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is out of date.
It looks to me like we shouldn't deduplicate because doc values doesn't deduplicate. Unless I've messed up my test somehow.
MappedFieldType ft = mapperService.fieldType("field"); | ||
assertFetch(mapperService, "field", randomFetchTestValueVendor(ft).get(), randomFetchTestFormat()); | ||
} finally { | ||
assertFetchWarnings(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could remove this method assertFetchWarnings()
and just call assertParseMinimalWarnings
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd work for now. Maybe not later. But if we need it then we can add it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -136,4 +137,10 @@ public void testRejectMultiValuedFields() throws MapperParsingException, IOExcep | |||
assertEquals("[rank_feature] fields do not support indexing multiple values for the same field [foo.field] in the same document", | |||
e.getCause().getMessage()); | |||
} | |||
|
|||
@Override | |||
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this more? I looked at the date tests but couldn't tell where having a 'supplier' was critical.
I'm still unsure about the tests on this one, but I think that's because I'm unsure about what we want to guarantee is returned in
Each of these options is perfectly reasonable, and each one requires different guarantees - this PR is basically testing for the second option here, but if we instead want to go for the first or third then we'll need different tests. In particular, the fact that you can apply a format implies that |
@jtibshirani and @romseygeek I think this is ready for another round. In response to @romseygeek's last comment - for the most part we're trying to be consistent with how we parse the field in the first place and to apply the format. This mostly means we line up with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely getting closer! I left a couple more questions.
@@ -136,4 +137,10 @@ public void testRejectMultiValuedFields() throws MapperParsingException, IOExcep | |||
assertEquals("[rank_feature] fields do not support indexing multiple values for the same field [foo.field] in the same document", | |||
e.getCause().getMessage()); | |||
} | |||
|
|||
@Override | |||
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to making this just an Object
|
||
@Override | ||
protected Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft) { | ||
switch (between(0, 3)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will definitely be nicer when it doesn't return a Supplier
as then we get a random mix of types within a single doc.
* and {@link #testFetchMany}. The returned objects must be supported by | ||
* {@link XContentBuilder#value(Object)} | ||
*/ | ||
protected abstract Supplier<? extends Object> randomFetchTestValueVendor(MappedFieldType ft); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about generateRandomInputValue
? Or we could reuse getSampleValueForDocument
and add some randomization there?
OK folks! Time for another round. I've remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - one nit and one query but no need for another go round from me.
@Override | ||
protected final Object generateRandomInputValue(MappedFieldType ft) { | ||
Number n = randomNumber(); | ||
return randomBoolean() ? n : n.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to run into #69382 and the issues with scientific format? Or does that only apply to date parsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numbers are fine parsing scientific notation. It's dates that hate 'em. There is an issue with buried in the somewhere but I've awaitsFixed that bit and opened another issue.
} | ||
|
||
/** | ||
* Create a vendor for similar random values to round trip in {@link #testFetch} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Generates random values to round trip in...`
We've had a few bugs in the fields API where is doesn't behave like we'd expect. Typically this happens because it isn't obvious what we expct. So we'll try and use randomized testing to ferret out what we want. This adds a test for most field types that asserts that `fields` works similarly to `docvalues_fields`. We expect this to be true for most fields. It does so by forcing all subclasses of `MapperTestCase` to define a method that makes random values. It declares a few other hooks that subclasses can override to further randomize the test. We skip the test for a few field types that don't have doc values: * `annotated_text` * `completion` * `search_as_you_type` * `text` We should come up with some way to test these without doc values, even if it isn't as nice. But that is a problem for another time, I think. We skip the test for a few more types just because I wanted to cut this PR in half so we could get to reviewing it earlier. We'll get to those in a follow up change. I've filed a few bugs for things that are inconsistent with `docvalues_fields`. Typically that means that we have to limit the random values that we generate to those that *do* round trip properly.
) We've had a few bugs in the fields API where is doesn't behave like we'd expect. Typically this happens because it isn't obvious what we expct. So we'll try and use randomized testing to ferret out what we want. This adds a test for most field types that asserts that `fields` works similarly to `docvalues_fields`. We expect this to be true for most fields. It does so by forcing all subclasses of `MapperTestCase` to define a method that makes random values. It declares a few other hooks that subclasses can override to further randomize the test. We skip the test for a few field types that don't have doc values: * `annotated_text` * `completion` * `search_as_you_type` * `text` We should come up with some way to test these without doc values, even if it isn't as nice. But that is a problem for another time, I think. We skip the test for a few more types just because I wanted to cut this PR in half so we could get to reviewing it earlier. We'll get to those in a follow up change. I've filed a few bugs for things that are inconsistent with `docvalues_fields`. Typically that means that we have to limit the random values that we generate to those that *do* round trip properly.
We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that
fields
works similarlyto
docvalues_fields
. We expect this to be true for most fields.It does so by forcing all subclasses of
MapperTestCase
to define amethod that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.
We skip the test for a few field types that don't have doc values:
annotated_text
completion
search_as_you_type
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.
For
text
fields we usefielddata
which works, but its not a greattest becuase we intentionally don't mimick
docvalue_fields
infields
. So we have to pick our input data in a funny way. It's betterthan nothing.
We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.
I've filed a few bugs for things that are inconsistent with
docvalues_fields
. Typically that means that we have to limit therandom values that we generate to those that do round trip properly.