-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add declarative parameters to FieldMappers #58663
Conversation
Pinging @elastic/es-search (:Search/Mapping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this! I left a few questions.
@@ -157,8 +157,6 @@ static KeywordFieldMapper createExtractQueryFieldBuilder(String name, BuilderCon | |||
static BinaryFieldMapper createQueryBuilderFieldBuilder(BuilderContext context) { | |||
BinaryFieldMapper.Builder builder = new BinaryFieldMapper.Builder(QUERY_BUILDER_FIELD_NAME); | |||
builder.docValues(true); | |||
builder.indexOptions(IndexOptions.NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self - these were the defaults so they are safe to drop.
|
||
public static class Builder extends ParametrizedFieldMapper.Builder { | ||
|
||
final ParametrizedFieldMapper.Parameter<Boolean> fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd import the inner class to make these a little shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know some folks hate importing inner classes but I don't really have hard and fast rules and think in this case it'd be easier to read.
return builder; | ||
} | ||
|
||
public static Parameter<Boolean> boolParam(String name, boolean updateable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These smell a little like our settings which have grown a bit difficult to read over the years. There, I think a builder pattern would have helped deal with how many optional things there are. But here I dunno.
I think merger might need to take the current value so it can make a call on more complex stuff, right?
If a value isn't updateable do you need to define a merger at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, looking above you use acceptsNull
in a very bulder like way, which I'm cool with.
I think it'd be good to make the "for declaration" method public
with javadoc. And the "for use by ParameterizedFieldMapper
" methods private
or at least package private. That way it is more obvious what we can call when building the mappers.
I'm sort of sad that this is bit of a parallel world to the xcontent parsers that we already have, but I know how we got here. And it is where we are. So, yeah, I'm +1 on building it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ to making the 'used internally only' methods private. I've renamed 'merger' to 'initializer' as that's what it does, takes an initial value from an existing mapper.
|
||
public final void merge(FieldMapper in, Conflicts conflicts) { | ||
for (Parameter<?> param : getParameters()) { | ||
if (param.updateable == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it'd be clearer if merge
checked for updateable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, good spot. Begone, freeze
|
||
public Parameter<T> init(FieldMapper toInit) { | ||
assert frozen == false; | ||
this.value = merger.apply(toInit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think merger
isn't really the right name for this, actually - it is more like read
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I've used initializer
super(name); | ||
} | ||
|
||
// For testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make them package private that does a pretty good job implying they are only exposed for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used outside the package, annoyingly. I've changed it so that if you need to set it programmatically you can use a constructor parameter.
For some reason, the ML mappings (and only the ML mappings) are now being serialized differently, even though this PR makes no changes to ObjectMapper serialization. 🤷 |
Neat! Always good to find a new surprise. |
|
||
import static org.hamcrest.Matchers.containsString; | ||
import static org.hamcrest.Matchers.equalTo; | ||
import static org.hamcrest.Matchers.instanceOf; | ||
|
||
public class BinaryFieldMapperTests extends FieldMapperTestCase<BinaryFieldMapper.Builder> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this we don't test store
and stuff like that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! So you have the tests in ParameterizedFieldTest
for things like this. Do you think it is important to also test that the field merges here too? I guess not, but I'm worried about dropping something accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea would be that we don't have 'common' fields any more - all mappers declare all their parameters outright (I am sharing the 'meta' parameter at the moment, but I think I'm going to stop doing that). So the tests for each field mapper should cover all the parameters that they declare, but general things like 'this parameter can't be updated so it should throw an error on merge' are tested in ParametrizedFieldTest
.
|
||
import static org.hamcrest.Matchers.containsString; | ||
|
||
public class BooleanFieldMapperTests extends FieldMapperTestCase<BooleanFieldMapper.Builder> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it too, I left a couple of questions
context.doc().add(new Field(fieldType().name(), value ? "T" : "F", Defaults.FIELD_TYPE)); | ||
} | ||
if (stored) { | ||
context.doc().add(new StoredField(fieldType().name(), value ? "T" : "F")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like if the field is searchable and stored too, we end up adding two fields, while before we would only add one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a field with a FieldType
that has stored
set to true
and adding a separate StoredField
amount to the same thing. This way, we don't need to change the field type at all, and it's clearer what's happening at the point where we're adding documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was odd that we were using Field
to get stored fields but doc values. Field
just does a bunch of stuff.
= new Parameter<>("null_value", false, null, (n, o) -> XContentMapValues.nodeBooleanValue(o), m -> toType(m).nullValue); | ||
private final Parameter<Boolean> stored = Parameter.boolParam("store", false, m -> toType(m).stored, false); | ||
private final Parameter<Map<String, String>> meta | ||
= new Parameter<>("meta", true, Collections.emptyMap(), TypeParsers::parseMeta, m -> m.fieldType().meta()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get confused here that stored seems to be treated differently from e.g. doc_values. Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convenience, mainly - doc_values
and index
can be derived from the field type values, but stored
can't be. I can change it so that doc_values
and index
are fields directly on the mapper as well if you think that's clearer? Certainly for things like searchable docvalues fields then we'll need to distinguish between whether or not something is indexed and whether it's searchable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I am trying to see what the difference is between stored and indexed/doc_values. I would expect them to be treated in the same way/same place. Why can't stored be derived from the field type values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands, MappedFieldType
doesn't have anything that tells you if the field is stored or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You made a boolean hasDocValues
on BinaryFieldMapper
for this. Is it worth making one here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add indexed
and hasDocValues
directly to the mappers as it sounds as though that will be less confusing.
Map.Entry<String, Object> entry = iterator.next(); | ||
final String propName = entry.getKey(); | ||
final Object propNode = entry.getValue(); | ||
if (Objects.equals("fields", propName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means that every field supports multi fields and copy_to, I think. But I think that runtime fields won't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add disabling feature flags later if we need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every field does support multi fields and copy_to right now. I'm happy to juggle things for runtime fields in a follow up.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I think @javanna might want another look too.
@@ -332,6 +332,13 @@ public static String nodeStringValue(Object node, String defaultValue) { | |||
return node.toString(); | |||
} | |||
|
|||
public static String nodeStringValue(Object node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its probably worth adding javadoc about how this is different from Objects.toString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
private final Parameter<Boolean> stored = Parameter.boolParam("store", false, m -> toType(m).stored, false); | ||
private final Parameter<Boolean> hasDocValues = Parameter.boolParam("doc_values", false, m -> toType(m).hasDocValues, false); | ||
private final Parameter<Map<String, String>> meta | ||
= new Parameter<>("meta", true, Collections.emptyMap(), TypeParsers::parseMeta, m -> m.fieldType().meta()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to eventually move meta
over to the FieldMapper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta
is used in field caps, which currently are based off MappedFieldType
so I think it will probably stay there?
= new Parameter<>("null_value", false, null, (n, o) -> XContentMapValues.nodeBooleanValue(o), m -> toType(m).nullValue); | ||
private final Parameter<Boolean> stored = Parameter.boolParam("store", false, m -> toType(m).stored, false); | ||
private final Parameter<Map<String, String>> meta | ||
= new Parameter<>("meta", true, Collections.emptyMap(), TypeParsers::parseMeta, m -> m.fieldType().meta()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You made a boolean hasDocValues
on BinaryFieldMapper
for this. Is it worth making one here too?
context.doc().add(new Field(fieldType().name(), value ? "T" : "F", Defaults.FIELD_TYPE)); | ||
} | ||
if (stored) { | ||
context.doc().add(new StoredField(fieldType().name(), value ? "T" : "F")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was odd that we were using Field
to get stored fields but doc values. Field
just does a bunch of stuff.
Map.Entry<String, Object> entry = iterator.next(); | ||
final String propName = entry.getKey(); | ||
final Object propNode = entry.getValue(); | ||
if (Objects.equals("fields", propName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every field does support multi fields and copy_to right now. I'm happy to juggle things for runtime fields in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which are only applicable to a subset of the 41 mapper implementations we ship with. Merging, parsing and serialization of these parameters are spread around the class hierarchy, with much repetitive boilerplate code required. It would be much easier to reason about these things if we could declare the parameter set of each FieldMapper directly in the implementing class, and share the parsing, merging and serialization logic instead. This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it. Parameters are declared on Builder classes, with the declaration including the parameter name, whether or not it is updateable, a default value, how to parse it from mappings, and how to extract it from another mapper at merge time. Builders have a getParameters method, which returns a list of the declared parameters; this is then used for parsing, merging and serialization. Merging is achieved by constructing a new Builder from the existing Mapper, and merging in values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new, merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final. Other mappers can be gradually migrated to this new style, and once they have all been refactored we can merge ParametrizedFieldMapper and FieldMapper entirely.
@romseygeek this is an exciting change! While updating the
Now I think we can't configure the parameters directly, they can only be set by parsing a mapping definition. Do you have any ideas around making field mapper construction easier? |
@jtibshirani I've tried out a couple of ideas here, but nothing is very satisfying - in particular, it's hard to put anything on the base |
I also noticed that most unit tests don't create builders directly. However I think this just points the fact that we rely too much on integration testing (or 'pseudo unit tests') to test mapper functionality.
This could work, I'll experiment with this as I keep the branch up to date. |
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.
This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a
getParameters
method, whichreturns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.
Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.