-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mappings: disallow exotic options on meta fields #8143
Comments
+1 I get the feeling a number of these options were added just in case the current settings didn't work out so well, but I think we can safely declare them battle tested now |
+1 |
Removed the |
I started work on this, but realized the branch was becoming massive with all the test fixes for different meta fields. I'm now attempting to have a separate PR for each meta field. First one is simple, just _uid: #9836 |
Also, cleanup writePre20Settings so it is shared across all field mappers. see elastic#8143
There are two implications to this change. First, percolator now uses _uid internally, extracting the id portion when needed. Second, sorting on _id is no longer possible, since you can no longer index _id. However, _uid can still be used to sort, and is better anyways as indexing _id just to make it available to fielddata for sorting is wasteful. see elastic#8143 closes elastic#9842
This also changes the stored setting for _size to true (for indexes created in 2.x). see elastic#8143 closes elastic#9913
While the parser allowed changing field type settings, these would never have been serialized. So this change simply removes parsing using parseField. Backcompat will still work if a user uploads old settings (they just would never have worked anyways, so we continue ignoring them with 1.x, and 2.x will now error). see elastic#8143 closes elastic#9914
|
Thanks @rjernst these changes are already awesome! |
Meta fields were locked down to not allow exotic options to the underlying field types in elastic#8143. This change fixes the docs to no longer refer to the old settings. closes elastic#10879
Meta fields were locked down to not allow exotic options to the underlying field types in elastic#8143. This change fixes the docs to no longer refer to the old settings. closes elastic#10879
Meta fields were locked down to not allow exotic options to the underlying field types in elastic#8143. This change fixes the docs to no longer refer to the old settings. closes elastic#10879
This is a follow up to elastic#8143 and elastic#6730 for _timestamp. It removes support for `path`, as well as any field type settings, and enables docvalues for _timestamp, for 2.0. Users who need to adjust these settings can use a date field.
In regards to the _type disabling, I've used this historically when I already had a field on my document that was the same value as the _type and therefore indexing it was not necessary since would craft the queries to filter on that field instead. For folks that have billions of very small documents, it was my understanding that not indexing _type was helpful as the ratio of that data compared to overall document size was larger. |
@djschny if there is a field that duplicates a meta field, why can't the user not send that field? _type is essentially a virtual field on _uid, so it is actually not using any more memory than would be used nornally. |
@rjernst Sure a user could not send that field, but that may not be ideal for them as they want their document to contain that field explicitly for when pulling data back out of ES. I understand that _type is essentially a virtual field on _uid, but it is my understanding that unless it is indexed separately then queries to find all docs of a particular type use a prefix query on _id which is not as ideal as a filter. Why I bring this up is there are valid use cases for the configuration of _type to not be indexed and it's not an "exotic" use case. I think its very important to draw a distinction between a configuration that is invalid or not necessary (like the "store" on _type) vs. ones that have a practical use case. |
I said "essentially" because it wasn't quite true. If a user doesn't want to pay that cost, they can not use types. By that I mean: send all their documents with the same ES type (the type will still be indexed, but the posting list will be highly compressed because all docs will have the same value). Then they can make their |
We have some mapping options that sound interesting but are actually almost useless or dangerous. I propose to remove them:
_type: { index: no }
Not indexing the
_type
sounds appealing since documentation mentions everything will keep on working, so that should just save space. Except that elasticsearch will internally run a prefix query on the_uid
field instead, which is going to be super slow. I think we should remove this option._type: { store: yes }
and_id: { store: yes }
Storing the type and _id is useless since we already enforce the _uid to be stored, and the _uid contains these informations.
_id: { index: not_analyzed }
The
_id
field is the same for all documents so we should not need to index, store or doc-value it. (Can be done now thanks to #6073 and #7965)In general I'm wondering if we shouldn't go further and completely lock down how data is indexed/stored/docvalued for meta fields. There would just remain high-level configuration options such as
enabled
on the_timestamp
mapper ortype
on_parent
.Relates to #8870
The text was updated successfully, but these errors were encountered: