Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove support for types? #15613

Closed
jpountz opened this issue Dec 22, 2015 · 75 comments
Closed

Remove support for types? #15613

jpountz opened this issue Dec 22, 2015 · 75 comments
Labels
Meta :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jpountz
Copy link
Contributor

jpountz commented Dec 22, 2015

The ability to have several types on the same index is causing problems:

  • the mappings APIs need to maintain one mapping per type, yet those mappings can't be independent and keeping them synchronized is complicated (see eg. discussions on Make mapping updates more robust. #15539)
  • it gives the feeling that the system can easily deal with documents that have very different mappings in the same index, which is not true. This is why in 2.0 we added more restrictions on mappings across types. In addition types encourage sparsity and sparse fields cause Lucene to either be slow when there is a special impl for the sparse case (eg. doc values) or use tremendous memory and disk space in spite of the fact that few documents have a value (eg. norms, because a fixed amount of memory is used for every doc, regardless of whether they have a value for this field or not).

Migrating existing users is certainly going to be complicated but this would also make the system more honest to new users about the fact that we can't do index-level multi-tenancy efficiently. Also I suspect that the restrictions that we added in 2.0 (that eg. two fields that have the same name in different types) already made lots of users migrate to a single index per data type instead of folding them into different types of the same index.

See also https://www.elastic.co/blog/index-vs-type.

@zygfryd
Copy link

zygfryd commented Dec 22, 2015

Seems like removing support for types would be blocked on #11432.

It'd be lovely if this accelerated parent-child relations across indexes though, that'd get rid of a lot of the aforementioned sparsity.

@nik9000
Copy link
Member

nik9000 commented Dec 22, 2015

What about divorcing mappings from types? Make mappings an index level feature and types just kind of like part of the id? Would that make them light enough to not get in the way?

@jpountz
Copy link
Contributor Author

jpountz commented Dec 22, 2015

@zygfryd indeed

@nik9000 That is an option too. At least it would make clear that there is a single mapping per index and we could stimplify the internals. With this option, I guess types would remain as first-class filters only (eg. we could do index sorting on _type so that filtering on them would be faster)?

@nik9000
Copy link
Member

nik9000 commented Dec 22, 2015

types would remain as first-class filters only

I'd be fine with that.

@ggrossetie
Copy link
Contributor

I think divorcing mappings from types is a good idea. In my opinion, removing types is too radical.

In our use case (logs management) we have 80+ types of logs and we use a (daily, weekly or monthly) index per project/tenant to be able to handle the load.
We do have a different mappings for each type of logs but mappings are identical across all indices.

In addition types encourage sparsity and sparse fields cause Lucene to either be slow when there is a special impl for the sparse case (eg. doc values) or use tremendous memory and disk space in spite of the fact that few documents have a value (eg. norms, because a fixed amount of memory is used for every doc, regardless of whether they have a value for this field or not).

If I read you correctly, we should instead use one index per log type ?
In our use case we have different needs for each project/tenant. Some projects can log as much as 5K logs/sec while others projects only log 10 logs/sec.
So we need to be able to configure replicas/shards and index creation frequency (daily, weekly, monthly...) per project/tenant.

Could you please explain a little bit more your proposal with this use case and with types removed ?

Make mappings an index level feature

👍

@clintongormley clintongormley added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Jan 10, 2016
@jpountz
Copy link
Contributor Author

jpountz commented May 30, 2016

If I read you correctly, we should instead use one index per log type ?

Yes. Types are trappy: at first sight they look like an efficient way to have multiple tenants in a single index, but in practice this usually makes things worse than having multiple indices due to the fact that Lucene likes dense data better than sparse data, especially for norms and doc values.

If some tenants have lower indexing rates, they would get fewer shards and/or longer time frames (eg. weekly indices instead of daily).

@jpountz
Copy link
Contributor Author

jpountz commented May 30, 2016

I first thought divorcing types from mappings would be a good compromise, but types have another issue that they force us to fold the type into the uid, which typically either makes the _uid less efficient (slower indexing and slower gets) if we prepend the type (like today) or more space-intensive if we append the type. So I think we should think about getting rid of types entirely. For instance, maybe we could consider enforcing a single type per index in version X, with APIs still working with either index/type or just index, and then removing types entirely in version X+1?

@s1monw
Copy link
Contributor

s1monw commented May 31, 2016

I think we should deprecate type in 5.0 and start moving towards index level mappings, uuid per index not per type etc. If somebody really needs the type in the UUID they can still do that I guess. Types can be build on top of es without native support, there is nothing today that prevents you from doing this. It rather complicates things on all end internally without real benefit to the outsite except of the first level API support that someone might find useful but is only syntactic sugar with a potential high price to pay. I am also +1 to remove this in 6.0 entirely and guide folks how to do it correctly.

@dadoonet
Copy link
Member

dadoonet commented Jun 2, 2016

The main concern I have for now is the support of the parent/child feature.
Today we need to have colocated data within the same shard for parent and children in order to perform joins in memory.

Removing types will only allow to do parent/child using the same "kind" of document.

Not super terrible as at the end of the day this is what is happening behind the scene.

So if we had:

PUT index/company/1
{
  "name": "elastic"
}
PUT index/employee/11?parent=1
{
  "first_name": "david"
}

We will basically have to rewrite this as for example:

PUT index/1
{
  "company": {
    "name": "elastic"
  }
}
PUT index/11?parent=1
{
  "employee": {
    "name": "david"
  }
}

It means that parent/child will be able to do self referencing as proposed here in #11432.
So as soon as we have support for #11432, I'm totally +1 to remove types.

BTW, may be we should already start to educate people to use only one type per index and use data structures similar to what I proposed in my example?

@jpountz
Copy link
Contributor Author

jpountz commented Aug 5, 2016

What about the following plan:

  • require 5.0+ indices to have at most one type (which means parent/child does not work)
  • add APIs without a type parameter, eg. POST index/ to index with an auto-generated id or PUT index/id to index with an explicit id, remove the requirement to have the type name as a top-level key in mappings, etc. (all this does not need to done in 5.0, this could come in 5.x)
  • remove types from APIs in 6.0+
  • separately work on another way to expose joins in a reasonable way, as a replacement for parent/child (assuming we want a replacement)

If we are not ready to drop parent/child right now, one trade-off I could consent would be to have a setting that allows indices to have multiple types so that parent/child can be used, but these indices could not be upgraded to 6.0.

For the record, we have some evidence that removing types could help indexing speed quite significantly since we would not have to fold the type name into the uid: #18154 (comment)

Thoughts?

@rjernst
Copy link
Member

rjernst commented Aug 11, 2016

@jpountz I think we should do this, but it seems your proposal has gone unnoticed given the lack of reaction (positive or negative). Can we get some other thoughts on this?

/cc @s1monw @clintongormley

@clintongormley
Copy link
Contributor

@rjernst i'm staring at it as you type :)

@clintongormley
Copy link
Contributor

clintongormley commented Aug 12, 2016

We discussed this in Fix it Friday.

Where we want to get to:

We want to remove the concept of types from Elasticsearch, while still supporting parent/child.

  • Field mappings will be at the top level, instead of under type names
  • The _uid will consist only of the _id, not type#id.
  • Creating a document with ID : PUT index/_doc/id
  • Creating a document with autogenerated ID : POST index/_doc

It's very important to me that we don't leave users behind - we need to give them a smooth upgrade and transition path.

Proposed path:

In 5.0:

In 5.x:

  • Add support for setting enabled:false to the _type field, which will exclude type from the UID and use a fixed type name for (eg) mapping requests/responses. These indices will not support parent/child. Only allow one type on 6.0 indices #24317
  • Add support for PUT index/_doc/id, POST index/_doc, and PUT|POST index/_mapping
  • Add new mechanism for specifying and supporting parent/child.

In 6.0:

  • Prevent new indices from being created with more than one type.

In 6.7:

  • For APIs whose request/ response structure changes with types removal (create index, get mapping, etc.), add a query string parameter (eg include_type_name) which indicates whether the requests/responses should include a type name. This parameter defaults to true. Issue a deprecation warning for all requests that don't set include_type_name to false

In 7.0:

  • Remove support for old parent/child (Remove legacy mapping code. #29224)
  • For requests whose structure does not change with types removal, make sure to accept both typed + typeless versions of the API. Issue a deprecation warning for requests that specify types.
  • Default include_type_name to false. Issue a deprecation warning for all requests that include_type_name.
  • Deprecate references to types in the Java high-level rest client

In 8.0

  • type can no longer be specified in requests
  • the include_type_name parameter is removed

In 6.0, all existing types from 5.x indices will have identical mappings. We will still have indices with old parent/child implementation. If we can migrate existing parent/child settings to the new settings, then we could move the "return fields at top level" issue into 6.0.

Alternatively, we could return fields at the top level in 6.0 regardless, and still show types (for old indices with types enabled, or with old parent-child) as a separate section in GET mapping.

UPDATED TO REFLECT CHANGES IN #15613 (comment)

UPDATED TO REFLECT CHANGES IN #15613 (comment)

UPDATED TO REFLECT CHANGES IN #15613 (comment)

UPDATED TO REFLECT CHANGES DISCUSSED IN #35190

jpountz added a commit to jpountz/elasticsearch that referenced this issue Aug 17, 2016
Currently both `PUT` and `POST` can be used to create indices. This commit
removes support for `POST index_name` so that we can use it to index documents
with auto-generated ids once types are removed.

Relates elastic#15613
@jpountz
Copy link
Contributor Author

jpountz commented Aug 18, 2016

Check other REST endpoints for similar clashes.

I think the only other endpoint we need to check is PUT/POST {index}/_mapping, which is free today. Am I missing another one?

@clintongormley
Copy link
Contributor

indices.exists_type (HEAD {index}/{type}) clashes with a typeless exists (HEAD {index}/{id})

Perhaps the existing form should be deprecated in favour of HEAD {index}/_mapping/{type}

I think that's the lot

@jpountz
Copy link
Contributor Author

jpountz commented Aug 18, 2016

In new indices, enforce that fields with the same name in different types must have identical mappings.

Having started to work on it, this is more challenging than I initially thought. However I think this might not be needed: since we will require at most one type in 6.0+ anyway, we will not have to merge mappings across types in 7.0, so this step is not required for the type removal?

jpountz added a commit to jpountz/elasticsearch that referenced this issue Aug 19, 2016
…/{type}`. elastic#20055

This will help remove types as we will need `{index}/{id}` to tell whether a
document exists.

Relates elastic#15613
@kimchy
Copy link
Member

kimchy commented Aug 22, 2016

@clintongormley I like the purposed plan! If we can find a way to support the new parent/child format in 5.x with the enabled set to false on _type, it would mean simpler migration down the road. We could start to push for setting this setting for new users.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 19, 2018

We just discussed how we want to update REST tests with this change. The issue is that index creation, index, update, put mapping and some other APIs are going to complain with version 7.x if the include_type_name parameter is not set. Yet we don't want to skip all tests that use these APIs if the version if < 7.0 as this would disable most tests when testing clusters that run multiple versions. As a consequence, we agreed on the following plan:

  • Use _doc as a type name all the time.
  • Modify the test runner to ignore warnings about include_type_name not being set. This will allow multi-version (6.x and 7.0) cluster tests to keep running like today. We did something similar when moving from 5 shards to 1 shard by default (Default to one shard #30539).
  • Duplicate REST tests for APIs that now support the include_type_name parameter in order to test the behavior both when include_type_name is set and when it is not set. This should only be done on tests that are dedicated to testing a specific API. For instance, if index creation is used as a way to setup a tests, it should not be duplicated while if index creation itself is tested, it should be duplicated.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Sep 21, 2018
This commit duplicates REST tests for the
 - `indices.create`
 - `indices.put_mapping`
 - `indices.get_mapping`
 - `index`
 - `get`
 - `delete`
 - `update`
 - `bulk`
APIs, so that we both test them when used without types (include_type_name=false)
and with types, mostly for mixed-version cluster tests.

Given a suite called `X_test_name.yml`, I first copied it to
`(X+1)_test_name_legacy.yml` and then changed `X_test_name.yml` to set
`include_type_name=false` on every API that supports it.

Relates elastic#15613
jpountz added a commit to jpountz/elasticsearch that referenced this issue Sep 21, 2018
A number of APIs (index creation, put mapping, index, etc.) will soon trigger
deprecation warnings unless users opt in for typeless APIs by passing
`include_type_name=false`.

Relates elastic#15613
@jpountz
Copy link
Contributor Author

jpountz commented Sep 24, 2018

I updated the plan to add a new item to the 7.0 tasks list: "remove references to types from the high-level rest client API".

jpountz added a commit to jpountz/elasticsearch that referenced this issue Sep 25, 2018
I'm starting with delete since it is a bit simpler than other APIs, but we
should eventually do the same with all other APIs that are being replaced with
a typeless version (get, put_mapping, search, etc.).

Relates elastic#15613
@jpountz
Copy link
Contributor Author

jpountz commented Sep 26, 2018

After seeing #33953 @jtibshirani raised the question of whether we want to do something so that users don't have to pass include_type_name=false to almost every request on 7.x.

It's true that for someone who would quickly resolve deprecation warnings and use the new typeless APIs, having to keep passing include_type_name=false to almost every request looks a bit ugly. Worse, new users would have to pass include_type_name=false as well only to stop doing it when they upgrade to 8.0. This isn't something that can't be coped with but maybe we could think about how we can make the experience better?

I don't think we should default include_type_name to true, this is going to be too surprising and will cause issues with mixed-version clusters. I also think it is important to give users a whole major version to upgrade: this is a major breaking change to our most used APIs (index creation, indexing of documents, search to name a few).

I have been considering adding a node setting, eg. rest.action.include_type_name, to change the default of include_type_name for all APIs, which would essentially make APIs behave like in 8.0. 7.x docs would still pass include_type_name=true to all affected APIs since we shouldn't expect users to have opted in for that behavior, but we could add notes that they could skip setting this on almost every request thanks to this new node setting.

Opinions / other suggestions? /cc @clintongormley @rjernst

@clintongormley
Copy link
Contributor

I have been considering adding a node setting, eg. rest.action.include_type_name, to change the default of include_type_name for all APIs, which would essentially make APIs behave like in 8.0.

I like this idea

jpountz added a commit that referenced this issue Sep 26, 2018
This commit duplicates REST tests for the
 - `indices.create`
 - `indices.put_mapping`
 - `indices.get_mapping`
 - `index`
 - `get`
 - `delete`
 - `update`
 - `bulk`
APIs, so that we both test them when used without types (include_type_name=false)
and with types, mostly for mixed-version cluster tests.

Given a suite called `X_test_name.yml`, I first copied it to
`(X+1)_test_name_with_types.yml` and then changed `X_test_name.yml` to set
`include_type_name=false` on every API that supports it.

Relates #15613
@jtibshirani
Copy link
Contributor

jtibshirani commented Sep 26, 2018

@jpountz to clarify, when you refer to include_type_name: true in the last two paragraphs, do you mean false?

Would it be possible to default the setting to false for new clusters, so that new users in 7.0 never come across the extra parameter? For customers upgrading from a previous version, the setting would instead be true by default. I’m just trying to brainstorm a way to keep the out-of-the-box experience nice for new users, and don’t have a perfect understanding of the cluster upgrade process/ details of running mixed version clusters.

kcm pushed a commit that referenced this issue Oct 30, 2018
This commit duplicates REST tests for the
 - `indices.create`
 - `indices.put_mapping`
 - `indices.get_mapping`
 - `index`
 - `get`
 - `delete`
 - `update`
 - `bulk`
APIs, so that we both test them when used without types (include_type_name=false)
and with types, mostly for mixed-version cluster tests.

Given a suite called `X_test_name.yml`, I first copied it to
`(X+1)_test_name_with_types.yml` and then changed `X_test_name.yml` to set
`include_type_name=false` on every API that supports it.

Relates #15613
@jtibshirani
Copy link
Contributor

jtibshirani commented Nov 13, 2018

An update: we met offline to talk through a revised plan for 7.0, which is now documented in #35190. The core of the plan is set, but there are still some open questions to sort out.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Jan 4, 2019
This adds an `include_type_name` option to the `indices.create`,
`indices.get_mapping` and `indices.put_mapping` APIs, which defaults to `true`.
When set to `false`, then mappings will be returned directly in the body of
the `indices.get_mapping` API, without keying them by the type name, the
`indices.create` will expect mappings directly under the `mappings` key, and
the `indices.put_mapping` will use `_doc` as a type name and fail if a `type`
is provided explicitly.

On 5.x indices, get-mapping will fail if the index has multiple mappings, and
put-mapping will update or introduce mappings for the `_doc` type instead of
updating existing mappings. This oddity is required so that we don't have to
introduce a new flag to put-mapping requests to know whether they are actually
updating the `_doc` type or performing a typeless call.

Relates elastic#15613
jtibshirani pushed a commit that referenced this issue Jan 9, 2019
This adds an `include_type_name` option to the `indices.create`,
`indices.get_mapping` and `indices.put_mapping` APIs, which defaults to `true`.
When set to `false`, then mappings will be returned directly in the body of
the `indices.get_mapping` API, without keying them by the type name, the
`indices.create` will expect mappings directly under the `mappings` key, and
the `indices.put_mapping` will use `_doc` as a type name and fail if a `type`
is provided explicitly.

On 5.x indices, get-mapping will fail if the index has multiple mappings, and
put-mapping will update or introduce mappings for the `_doc` type instead of
updating existing mappings. This oddity is required so that we don't have to
introduce a new flag to put-mapping requests to know whether they are actually
updating the `_doc` type or performing a typeless call.

Relates #15613
@cjcenizal
Copy link
Contributor

@jpountz Can we close this issue now that types have been deprecated?

@jimczi
Copy link
Contributor

jimczi commented Mar 12, 2020

We're tracking the remaining work for the types removal in another meta issue now so I am closing this one. Long live the types...

@jimczi jimczi closed this as completed Mar 12, 2020
@penfree
Copy link

penfree commented Apr 17, 2020

@jpountz @jimczi I am confused on this issue, would you please help to explain it for me?

If i have one index with 3 types, like:

  • type1: f1: integer, f2: integer, f3: text
  • type2: f4: integer, f5: text

then es(before version 5.x) will has 5 fields for every document in the index, which will lead to a sparse index. i'm thinking about a solution by make a transformation when create mapping like:

  • type1: integer_field_1, integer_field2, text_field_1
  • type2: integer_field_1, text_field_1

so the index will only has 3 fields:

  • integer_field_1: type1.f1, type2.f4
  • interger_field_2: type1.f2
  • text_field_1: type1.f3, type2.f5

when query the index, just do the same transformation to convert the field in query to real field stored in index. Can this solve the problem of sparse index? So that we can keep multiple type to facilitate data modeling.

@jpountz
Copy link
Contributor Author

jpountz commented Apr 24, 2020

@penfree This would work, but wouldn't it be even easier to have 2 indices?

@penfree
Copy link

penfree commented May 19, 2020

@penfree This would work, but wouldn't it be even easier to have 2 indices?

multiple doctype make it much more easier to design data model.

An example:
our data is case of patient, it contains multiple doc type, like

  • patient info
    • clinic visit info
      • diagnosis
      • lab report
      • exam report

we should make a query on diagnosis or lab report to search clinic visit or patient, it is difficult to do it in multiple indices. Not to mention that we create an index per day and place them in an alias。

@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests