-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove possibility for conflicting field definitions and ambiguous field resolution #8870
Comments
what about fields with different path (but same name) and different mapping in one type? |
@OlegYch Since we will no longer support short names, only the full path is used to identify a field, so it is only the path that matters. |
we are using 'mapping types' precisely to store different kinds of documents in the same index (so that we can use parent/child queries) |
As you said, conflicts are rare (for most people). Normally a field with the same name refers to the same type of data. The alternative (eg prefixing the field name with the type) would create much more sparsity in the index, and would impact every cross-type query (as multiple fields would need to be queried). Right now, we have opted for making the common case correct and efficient. However, we have left the APIs as they are in case, sometime in the future, we manage to figure out a cleverer way of handling conflicting field definitions. |
For the record, I want to raise here again our objection to the way this change is planned. We have several use cases in which we have an index with custom types that have custom fields. Types are not pre-defined and fields are not pre-defined. There's high potential of fields with the same name under different types, including fields with the same name that have different field type (and it happens). We have a lot of this kind of data. This modeling was aligned with the way Types where presented by Elasticsearch (and still are - in our last meetup Types where referred to by Boaz as equivalent to "tables"). So with this change we will be enforced to either hard-code the type as a prefix for all fields ourselves, or encapsulate all documents with a root "type" node. It can be done but is ugly, requires patchy handling and also reindexing all data. A better option that will allow such use cases is having a setting on the index level to configure field type isolation. So if we know that we need field type isolation (and we don't have cross-type searches and are willing to pay the overhead), we can set it to "type" level, and internally fields will be prefixed by the type. |
Hi @rore
If you have this, then your data is essentially broken today, and you can end up with incorrect results, exceptions, or even corrupt indices. The first thing that we're trying to do is to make everything safe and predictable. We are leaving the mapping APIs as they are so that, in the future, we may be able to revisit this decision and provide more alternatives. |
+1. This has been a major pain ever since I started using ES, and still happens in 1.6. Deleting the type alone doesn't fix the problem -- I have to completely delete the entire index and re-index everything, and then sometimes the problem goes away. I do set up a strict mapping before I index anything. It really feels to be a timing thing between the shards. |
Maybe? This only impacts users who have the same field name mapped to different data types under different document types in the same index. I don't know how many users this affects. Given Logstash has had this behavior (by default a daily index) for many years and given the anecdote that I can't recall much reports of this problem from Logstash users in those years, I'm not sure how big a problem this will be for Logstash users. I will confess that prior to this ticket, my assumptions were that type mappings were fully independent even if they shared field names (w/ different mappings) - I know accept this as incorrect, but I don't know great an impact this has had against Logstash users at this time.
These are two different things. One constraint "Fields in the same index but on different document types must have the same mapping" is not the same as saying "the same schema must exist for all documents in the same index regardless of type - the conflict is only when two fields occupy the same name but different mappings in one index. More research is needed for the Logstash side of things, but it's possible we may want to change the default index to include the 'type' field from Logstash (it'll be a backwards-compatibility-breaking change, if we do this). Hopefully the script from #10214 will help users figure out how this will impact them before upgrading, and we can address further from there. |
After reading this: https://www.elastic.co/guide/en/elasticsearch/reference/1.3/mapping.html I think this should be taken out of the docs: It looks like this is probably my issue, but I think the documentation should be updated to say that using different mapping characteristics on fields with the same name across multiple types is not supported, because some searches fall apart completely, such as sorting. |
Indexes can have the same field name with a different type. Doesn't this change move the query problem out of the index/type level and up in to the index? It seems to me that I had a query spanning indexes I'd still have the same fieldname+datatype conflict problem. Is there any merit in resolving this as part of the query DSL? If you have conflicting field+datatypes then could the query allow the caller to specify which field+datatype they wanted ignoring docs that don't match the criteria. |
@monowai The important thing about #8871 was making field types consistent within an index. You are correct that across indexes, the problem can still exist. However, whether this is an error case depends on the query. If the query is not parseable in one of the indexes, an exception would be raised. This was already the case, but now it should be consistently raised, while before mixed field types within an index could have masked the problem (depending on the order the document types were loaded within mappings). |
Fields with the same name in different types in the same index should have the same mapping. Previously, this has been advised as "best practice", but relying on advice has proved insufficient (see #4081 for the many issues that have resulted from allowing conflicting field definitions). Instead we need to enforce this in the API.
This issue (which replaces #4081) is a meta issue listing all of the changes that need to be made:
_type
.analyzer
,index_analyzer
,search_analyzer
#8874 Remove type-levelanalyzer
,search_analyzer
,index_analyzer
settings._analyzer
_uid
,_id
,_index
,_type
,_field_names
._routing
and_id
from document fields, but should still be possible to set_routing
to required._timestamp
,_ttl
,_routing
,_parent
as these are required for reindexing._source
field should be stored? Disabling source prevents reindexing, updates, and out-of-the-box highlighting. Perhaps with the new compression options (Addbest_compression
option for indices #8863), the ability to disable the_source
field is less useful?_boost
field #8875 Remove the_boost
field (deprecated in v1.0.0.RC1).index_name
andpath
#6677 Deprecateindex_name
andpath
settingsAlternatives:
login_name
vslogin_date
.The text was updated successfully, but these errors were encountered: