Remove support for types? #15613

jpountz · 2015-12-22T17:16:20Z

The ability to have several types on the same index is causing problems:

the mappings APIs need to maintain one mapping per type, yet those mappings can't be independent and keeping them synchronized is complicated (see eg. discussions on Make mapping updates more robust. #15539)
it gives the feeling that the system can easily deal with documents that have very different mappings in the same index, which is not true. This is why in 2.0 we added more restrictions on mappings across types. In addition types encourage sparsity and sparse fields cause Lucene to either be slow when there is a special impl for the sparse case (eg. doc values) or use tremendous memory and disk space in spite of the fact that few documents have a value (eg. norms, because a fixed amount of memory is used for every doc, regardless of whether they have a value for this field or not).

Migrating existing users is certainly going to be complicated but this would also make the system more honest to new users about the fact that we can't do index-level multi-tenancy efficiently. Also I suspect that the restrictions that we added in 2.0 (that eg. two fields that have the same name in different types) already made lots of users migrate to a single index per data type instead of folding them into different types of the same index.

See also https://www.elastic.co/blog/index-vs-type.

zygfryd · 2015-12-22T21:51:36Z

Seems like removing support for types would be blocked on #11432.

It'd be lovely if this accelerated parent-child relations across indexes though, that'd get rid of a lot of the aforementioned sparsity.

nik9000 · 2015-12-22T21:56:20Z

What about divorcing mappings from types? Make mappings an index level feature and types just kind of like part of the id? Would that make them light enough to not get in the way?

jpountz · 2015-12-22T22:38:30Z

@zygfryd indeed

@nik9000 That is an option too. At least it would make clear that there is a single mapping per index and we could stimplify the internals. With this option, I guess types would remain as first-class filters only (eg. we could do index sorting on _type so that filtering on them would be faster)?

nik9000 · 2015-12-22T22:40:31Z

types would remain as first-class filters only

I'd be fine with that.

ggrossetie · 2015-12-31T15:38:44Z

I think divorcing mappings from types is a good idea. In my opinion, removing types is too radical.

In our use case (logs management) we have 80+ types of logs and we use a (daily, weekly or monthly) index per project/tenant to be able to handle the load.
We do have a different mappings for each type of logs but mappings are identical across all indices.

In addition types encourage sparsity and sparse fields cause Lucene to either be slow when there is a special impl for the sparse case (eg. doc values) or use tremendous memory and disk space in spite of the fact that few documents have a value (eg. norms, because a fixed amount of memory is used for every doc, regardless of whether they have a value for this field or not).

If I read you correctly, we should instead use one index per log type ?
In our use case we have different needs for each project/tenant. Some projects can log as much as 5K logs/sec while others projects only log 10 logs/sec.
So we need to be able to configure replicas/shards and index creation frequency (daily, weekly, monthly...) per project/tenant.

Could you please explain a little bit more your proposal with this use case and with types removed ?

Make mappings an index level feature

👍

jpountz · 2016-05-30T20:49:14Z

If I read you correctly, we should instead use one index per log type ?

Yes. Types are trappy: at first sight they look like an efficient way to have multiple tenants in a single index, but in practice this usually makes things worse than having multiple indices due to the fact that Lucene likes dense data better than sparse data, especially for norms and doc values.

If some tenants have lower indexing rates, they would get fewer shards and/or longer time frames (eg. weekly indices instead of daily).

jpountz · 2016-05-30T20:54:09Z

I first thought divorcing types from mappings would be a good compromise, but types have another issue that they force us to fold the type into the uid, which typically either makes the _uid less efficient (slower indexing and slower gets) if we prepend the type (like today) or more space-intensive if we append the type. So I think we should think about getting rid of types entirely. For instance, maybe we could consider enforcing a single type per index in version X, with APIs still working with either index/type or just index, and then removing types entirely in version X+1?

s1monw · 2016-05-31T07:18:02Z

I think we should deprecate type in 5.0 and start moving towards index level mappings, uuid per index not per type etc. If somebody really needs the type in the UUID they can still do that I guess. Types can be build on top of es without native support, there is nothing today that prevents you from doing this. It rather complicates things on all end internally without real benefit to the outsite except of the first level API support that someone might find useful but is only syntactic sugar with a potential high price to pay. I am also +1 to remove this in 6.0 entirely and guide folks how to do it correctly.

dadoonet · 2016-06-02T08:29:01Z

The main concern I have for now is the support of the parent/child feature.
Today we need to have colocated data within the same shard for parent and children in order to perform joins in memory.

Removing types will only allow to do parent/child using the same "kind" of document.

Not super terrible as at the end of the day this is what is happening behind the scene.

So if we had:

PUT index/company/1
{
  "name": "elastic"
}
PUT index/employee/11?parent=1
{
  "first_name": "david"
}

We will basically have to rewrite this as for example:

PUT index/1
{
  "company": {
    "name": "elastic"
  }
}
PUT index/11?parent=1
{
  "employee": {
    "name": "david"
  }
}

It means that parent/child will be able to do self referencing as proposed here in #11432.
So as soon as we have support for #11432, I'm totally +1 to remove types.

BTW, may be we should already start to educate people to use only one type per index and use data structures similar to what I proposed in my example?

jpountz · 2016-08-05T15:24:36Z

What about the following plan:

require 5.0+ indices to have at most one type (which means parent/child does not work)
add APIs without a type parameter, eg. POST index/ to index with an auto-generated id or PUT index/id to index with an explicit id, remove the requirement to have the type name as a top-level key in mappings, etc. (all this does not need to done in 5.0, this could come in 5.x)
remove types from APIs in 6.0+
separately work on another way to expose joins in a reasonable way, as a replacement for parent/child (assuming we want a replacement)

If we are not ready to drop parent/child right now, one trade-off I could consent would be to have a setting that allows indices to have multiple types so that parent/child can be used, but these indices could not be upgraded to 6.0.

For the record, we have some evidence that removing types could help indexing speed quite significantly since we would not have to fold the type name into the uid: #18154 (comment)

Thoughts?

rjernst · 2016-08-11T16:24:40Z

@jpountz I think we should do this, but it seems your proposal has gone unnoticed given the lack of reaction (positive or negative). Can we get some other thoughts on this?

/cc @s1monw @clintongormley

clintongormley · 2016-08-11T16:26:01Z

@rjernst i'm staring at it as you type :)

clintongormley · 2016-08-12T12:42:15Z

Currently both `PUT` and `POST` can be used to create indices. This commit removes support for `POST index_name` so that we can use it to index documents with auto-generated ids once types are removed. Relates elastic#15613

jpountz · 2016-08-18T07:31:37Z

Check other REST endpoints for similar clashes.

I think the only other endpoint we need to check is PUT/POST {index}/_mapping, which is free today. Am I missing another one?

clintongormley · 2016-08-18T07:47:44Z

indices.exists_type (HEAD {index}/{type}) clashes with a typeless exists (HEAD {index}/{id})

Perhaps the existing form should be deprecated in favour of HEAD {index}/_mapping/{type}

I think that's the lot

jpountz · 2016-08-18T09:59:58Z

In new indices, enforce that fields with the same name in different types must have identical mappings.

Having started to work on it, this is more challenging than I initially thought. However I think this might not be needed: since we will require at most one type in 6.0+ anyway, we will not have to merge mappings across types in 7.0, so this step is not required for the type removal?

…/{type}`. elastic#20055 This will help remove types as we will need `{index}/{id}` to tell whether a document exists. Relates elastic#15613

kimchy · 2016-08-22T09:41:02Z

@clintongormley I like the purposed plan! If we can find a way to support the new parent/child format in 5.x with the enabled set to false on _type, it would mean simpler migration down the road. We could start to push for setting this setting for new users.

jpountz · 2018-09-19T14:50:04Z

We just discussed how we want to update REST tests with this change. The issue is that index creation, index, update, put mapping and some other APIs are going to complain with version 7.x if the include_type_name parameter is not set. Yet we don't want to skip all tests that use these APIs if the version if < 7.0 as this would disable most tests when testing clusters that run multiple versions. As a consequence, we agreed on the following plan:

Use _doc as a type name all the time.
Modify the test runner to ignore warnings about include_type_name not being set. This will allow multi-version (6.x and 7.0) cluster tests to keep running like today. We did something similar when moving from 5 shards to 1 shard by default (Default to one shard #30539).
Duplicate REST tests for APIs that now support the include_type_name parameter in order to test the behavior both when include_type_name is set and when it is not set. This should only be done on tests that are dedicated to testing a specific API. For instance, if index creation is used as a way to setup a tests, it should not be duplicated while if index creation itself is tested, it should be duplicated.

This commit duplicates REST tests for the - `indices.create` - `indices.put_mapping` - `indices.get_mapping` - `index` - `get` - `delete` - `update` - `bulk` APIs, so that we both test them when used without types (include_type_name=false) and with types, mostly for mixed-version cluster tests. Given a suite called `X_test_name.yml`, I first copied it to `(X+1)_test_name_legacy.yml` and then changed `X_test_name.yml` to set `include_type_name=false` on every API that supports it. Relates elastic#15613

A number of APIs (index creation, put mapping, index, etc.) will soon trigger deprecation warnings unless users opt in for typeless APIs by passing `include_type_name=false`. Relates elastic#15613

jpountz · 2018-09-24T13:40:28Z

I updated the plan to add a new item to the 7.0 tasks list: "remove references to types from the high-level rest client API".

I'm starting with delete since it is a bit simpler than other APIs, but we should eventually do the same with all other APIs that are being replaced with a typeless version (get, put_mapping, search, etc.). Relates elastic#15613

jpountz · 2018-09-26T08:28:04Z

After seeing #33953 @jtibshirani raised the question of whether we want to do something so that users don't have to pass include_type_name=false to almost every request on 7.x.

It's true that for someone who would quickly resolve deprecation warnings and use the new typeless APIs, having to keep passing include_type_name=false to almost every request looks a bit ugly. Worse, new users would have to pass include_type_name=false as well only to stop doing it when they upgrade to 8.0. This isn't something that can't be coped with but maybe we could think about how we can make the experience better?

I don't think we should default include_type_name to true, this is going to be too surprising and will cause issues with mixed-version clusters. I also think it is important to give users a whole major version to upgrade: this is a major breaking change to our most used APIs (index creation, indexing of documents, search to name a few).

I have been considering adding a node setting, eg. rest.action.include_type_name, to change the default of include_type_name for all APIs, which would essentially make APIs behave like in 8.0. 7.x docs would still pass include_type_name=true to all affected APIs since we shouldn't expect users to have opted in for that behavior, but we could add notes that they could skip setting this on almost every request thanks to this new node setting.

Opinions / other suggestions? /cc @clintongormley @rjernst

clintongormley · 2018-09-26T09:19:47Z

I have been considering adding a node setting, eg. rest.action.include_type_name, to change the default of include_type_name for all APIs, which would essentially make APIs behave like in 8.0.

I like this idea

This commit duplicates REST tests for the - `indices.create` - `indices.put_mapping` - `indices.get_mapping` - `index` - `get` - `delete` - `update` - `bulk` APIs, so that we both test them when used without types (include_type_name=false) and with types, mostly for mixed-version cluster tests. Given a suite called `X_test_name.yml`, I first copied it to `(X+1)_test_name_with_types.yml` and then changed `X_test_name.yml` to set `include_type_name=false` on every API that supports it. Relates #15613

jtibshirani · 2018-09-26T17:18:39Z

@jpountz to clarify, when you refer to include_type_name: true in the last two paragraphs, do you mean false?

Would it be possible to default the setting to false for new clusters, so that new users in 7.0 never come across the extra parameter? For customers upgrading from a previous version, the setting would instead be true by default. I’m just trying to brainstorm a way to keep the out-of-the-box experience nice for new users, and don’t have a perfect understanding of the cluster upgrade process/ details of running mixed version clusters.

This commit duplicates REST tests for the - `indices.create` - `indices.put_mapping` - `indices.get_mapping` - `index` - `get` - `delete` - `update` - `bulk` APIs, so that we both test them when used without types (include_type_name=false) and with types, mostly for mixed-version cluster tests. Given a suite called `X_test_name.yml`, I first copied it to `(X+1)_test_name_with_types.yml` and then changed `X_test_name.yml` to set `include_type_name=false` on every API that supports it. Relates #15613

jtibshirani · 2018-11-13T00:39:46Z

An update: we met offline to talk through a revised plan for 7.0, which is now documented in #35190. The core of the plan is set, but there are still some open questions to sort out.

This adds an `include_type_name` option to the `indices.create`, `indices.get_mapping` and `indices.put_mapping` APIs, which defaults to `true`. When set to `false`, then mappings will be returned directly in the body of the `indices.get_mapping` API, without keying them by the type name, the `indices.create` will expect mappings directly under the `mappings` key, and the `indices.put_mapping` will use `_doc` as a type name and fail if a `type` is provided explicitly. On 5.x indices, get-mapping will fail if the index has multiple mappings, and put-mapping will update or introduce mappings for the `_doc` type instead of updating existing mappings. This oddity is required so that we don't have to introduce a new flag to put-mapping requests to know whether they are actually updating the `_doc` type or performing a typeless call. Relates elastic#15613

This adds an `include_type_name` option to the `indices.create`, `indices.get_mapping` and `indices.put_mapping` APIs, which defaults to `true`. When set to `false`, then mappings will be returned directly in the body of the `indices.get_mapping` API, without keying them by the type name, the `indices.create` will expect mappings directly under the `mappings` key, and the `indices.put_mapping` will use `_doc` as a type name and fail if a `type` is provided explicitly. On 5.x indices, get-mapping will fail if the index has multiple mappings, and put-mapping will update or introduce mappings for the `_doc` type instead of updating existing mappings. This oddity is required so that we don't have to introduce a new flag to put-mapping requests to know whether they are actually updating the `_doc` type or performing a typeless call. Relates #15613

cjcenizal · 2020-02-25T19:52:37Z

@jpountz Can we close this issue now that types have been deprecated?

jimczi · 2020-03-12T11:40:51Z

We're tracking the remaining work for the types removal in another meta issue now so I am closing this one. Long live the types...

penfree · 2020-04-17T12:08:16Z

@jpountz @jimczi I am confused on this issue, would you please help to explain it for me?

If i have one index with 3 types, like:

type1: f1: integer, f2: integer, f3: text
type2: f4: integer, f5: text

then es(before version 5.x) will has 5 fields for every document in the index, which will lead to a sparse index. i'm thinking about a solution by make a transformation when create mapping like:

type1: integer_field_1, integer_field2, text_field_1
type2: integer_field_1, text_field_1

so the index will only has 3 fields:

integer_field_1: type1.f1, type2.f4
interger_field_2: type1.f2
text_field_1: type1.f3, type2.f5

when query the index, just do the same transformation to convert the field in query to real field stored in index. Can this solve the problem of sparse index? So that we can keep multiple type to facilitate data modeling.

jpountz · 2020-04-24T21:07:58Z

@penfree This would work, but wouldn't it be even easier to have 2 indices?

penfree · 2020-05-19T09:31:19Z

@penfree This would work, but wouldn't it be even easier to have 2 indices?

multiple doctype make it much more easier to design data model.

An example:
our data is case of patient, it contains multiple doc type, like

patient info
- clinic visit info
  - diagnosis
  - lab report
  - exam report

we should make a query on diagnosis or lab report to search clinic visit or patient, it is difficult to do it in multiple indices. Not to mention that we create an index per day and place them in an alias。

jpountz added the discuss label Dec 22, 2015

jpountz mentioned this issue Dec 31, 2015

Return an aggregated view of all mappings/properties of all types #15728

Closed

clintongormley added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Jan 10, 2016

jpountz mentioned this issue Jan 25, 2016

Allow deletion of document types from an index #16207

Closed

rjernst added blocker v5.0.0-beta1 labels Aug 11, 2016

jpountz mentioned this issue Aug 16, 2016

Only use PUT for index creation, not POST. #20001

Merged

jpountz mentioned this issue Aug 18, 2016

Switch indices.exists_type from {index}/{type} to {index}/_mapping/{type}. #20055

Merged

martijnvg mentioned this issue Aug 31, 2016

Typeless parent/child #20257

Closed

jpountz mentioned this issue Sep 12, 2018

RestHighLevelClient should implement type-less APIs #33625

Closed

7 tasks

jpountz mentioned this issue Sep 21, 2018

REST test for typeless APIs. #33934

Merged

jpountz mentioned this issue Sep 21, 2018

Start moving docs to typeless APIs. #33953

Closed

jpountz mentioned this issue Sep 25, 2018

Make the high-level REST client skip types for document deletion. #34041

Closed

This was referenced Nov 2, 2018

Implementation tracking for 7.0 types deprecation. #35190

Closed

Ignore warnings related to types deprecation in REST tests. #35395

Merged

ghost mentioned this issue Nov 12, 2019

Service names for IndexService instances are undocumented and confusing ongr-io/ElasticsearchBundle#890

Open

jimczi closed this as completed Mar 12, 2020

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove support for types? #15613

Remove support for types? #15613

jpountz commented Dec 22, 2015

zygfryd commented Dec 22, 2015

nik9000 commented Dec 22, 2015

jpountz commented Dec 22, 2015

nik9000 commented Dec 22, 2015

ggrossetie commented Dec 31, 2015

jpountz commented May 30, 2016

jpountz commented May 30, 2016

s1monw commented May 31, 2016

dadoonet commented Jun 2, 2016 •

edited

Loading

jpountz commented Aug 5, 2016

rjernst commented Aug 11, 2016 •

edited

Loading

clintongormley commented Aug 11, 2016

clintongormley commented Aug 12, 2016 •

edited by jtibshirani

Loading

jpountz commented Aug 18, 2016

clintongormley commented Aug 18, 2016

jpountz commented Aug 18, 2016

kimchy commented Aug 22, 2016

jpountz commented Sep 19, 2018

jpountz commented Sep 24, 2018

jpountz commented Sep 26, 2018

clintongormley commented Sep 26, 2018

jtibshirani commented Sep 26, 2018 •

edited

Loading

jtibshirani commented Nov 13, 2018 •

edited

Loading

cjcenizal commented Feb 25, 2020

jimczi commented Mar 12, 2020

penfree commented Apr 17, 2020 •

edited

Loading

jpountz commented Apr 24, 2020

penfree commented May 19, 2020

Remove support for types? #15613

Remove support for types? #15613

Comments

jpountz commented Dec 22, 2015

zygfryd commented Dec 22, 2015

nik9000 commented Dec 22, 2015

jpountz commented Dec 22, 2015

nik9000 commented Dec 22, 2015

ggrossetie commented Dec 31, 2015

jpountz commented May 30, 2016

jpountz commented May 30, 2016

s1monw commented May 31, 2016

dadoonet commented Jun 2, 2016 • edited Loading

jpountz commented Aug 5, 2016

rjernst commented Aug 11, 2016 • edited Loading

clintongormley commented Aug 11, 2016

clintongormley commented Aug 12, 2016 • edited by jtibshirani Loading

Where we want to get to:

Proposed path:

jpountz commented Aug 18, 2016

clintongormley commented Aug 18, 2016

jpountz commented Aug 18, 2016

kimchy commented Aug 22, 2016

jpountz commented Sep 19, 2018

jpountz commented Sep 24, 2018

jpountz commented Sep 26, 2018

clintongormley commented Sep 26, 2018

jtibshirani commented Sep 26, 2018 • edited Loading

jtibshirani commented Nov 13, 2018 • edited Loading

cjcenizal commented Feb 25, 2020

jimczi commented Mar 12, 2020

penfree commented Apr 17, 2020 • edited Loading

jpountz commented Apr 24, 2020

penfree commented May 19, 2020

dadoonet commented Jun 2, 2016 •

edited

Loading

rjernst commented Aug 11, 2016 •

edited

Loading

clintongormley commented Aug 12, 2016 •

edited by jtibshirani

Loading

jtibshirani commented Sep 26, 2018 •

edited

Loading

jtibshirani commented Nov 13, 2018 •

edited

Loading

penfree commented Apr 17, 2020 •

edited

Loading