Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate _timestamp and _ttl into _expiration #9679

Closed
rjernst opened this issue Feb 12, 2015 · 14 comments
Closed

Consolidate _timestamp and _ttl into _expiration #9679

rjernst opened this issue Feb 12, 2015 · 14 comments
Assignees
Labels
>breaking help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@rjernst
Copy link
Member

rjernst commented Feb 12, 2015

Currently, the only purpose of the _timestamp field is to allow using a time other than NOW for use with _ttl. In #9058 we are working on simplifying where the explicit timestamp is specified.

I propose we actually have a single field, _expiration, which takes the behavior of both. Internally, _ttl already indexes the actual expiration time, so that it can be queried later to do the deletes. _expiration would be a much better and direct name for this field, and can have the following settings:

  • required - do we enforce expiration is set in indexing requests?
  • default_ttl - same as the default in _ttl now (cannot be used with required)

The parameters for indexing could then be expiration which is an explicit datetime to expire, or ttl which has the same behavior as today. One and only one of these parameters would be allowed, and one would be required if the required flag was set.

@rjernst rjernst added :Search Foundations/Mapping Index mappings, including merging and defining field types >breaking discuss labels Feb 12, 2015
@rjernst
Copy link
Member Author

rjernst commented Feb 12, 2015

Oh, and I forgot the most important setting: enabled. :)

@clintongormley
Copy link
Contributor

I think this is missing the case where users would like an automated timestamp representing NOW to be added to each doc, but not for the purpose of expiration.

@jpountz
Copy link
Contributor

jpountz commented Feb 12, 2015

This sounds good to me. I think it's important to have as few root mappers as necessary and to lock them down as much as we can.

This makes me wonder if we actually need settings besides default_ttl: a negative value could mean to not apply ttls by default and then we would only expire documents if either a ttl has been specified or default_ttl is set? I understand it does a bit less than your suggestion, but it is appealing to have a single setting.

@rjernst
Copy link
Member Author

rjernst commented Feb 12, 2015

@clintongormley A meta field doesn't seem like the right place to have that functionality? It seems like having a date field with null value of NOW would be the right way to allow that (if that doesn't already work, we should make it work).

@jpountz I think having enabled is good to match the setup of other meta fields. I also think having the required flag isn't too complex, but I'd be happy with either of these models: whatever simplifies how to use this feature.

@clintongormley
Copy link
Contributor

@rjernst if it is not a meta field, then the only place to store it would be in the _source, and we don't change the original _source as a rule.

@clintongormley clintongormley self-assigned this Feb 13, 2015
@rmuir
Copy link
Contributor

rmuir commented Feb 26, 2015

I think this is missing the case where users would like an automated timestamp representing NOW to be added to each doc, but not for the purpose of expiration.

I don't think this should keep us in the "business" of maintaining a system _timestamp field. Why do we do this? It just makes the system complicated. Just let users configure their own date fields.

If we need a fieldtype that can 'set itself from NOW' for esoteric cases, why can't that be a separate one, maybe even a plugin.

we can't let the 1% screw over the 99%

@rjernst
Copy link
Member Author

rjernst commented Feb 26, 2015

Also, having a single auto generated timestamp field is very limited. What if you want auto NOW at doc creation, and another field for doc updated? This is typical in dbs, and it seems silly to limit to just one of those, if we are to support it (which we maybe should, but as part of date fields, not a special meta field).

@clintongormley
Copy link
Contributor

New proposal:

Remove the _timestamp field

The only added purpose to the timestamp field is to allow a timestamp to be auto-generated. The user may well want more than one auto-generated field, so we shouldn't restrict this ability to just a single field.

Add auto-generated timestamps to ordinary date fields

We could allow the null_value for date fields to use date maths, eg:

"last_modified": {
  "type": "date",
  "format": "YYYY-MM-dd",
  "null_value": "now"
}

Or alternatively, even accept date maths directly in the _source:

PUT my_index/my_type/1
{
  "title": "My title",
  "last_modified": "now"
}

The generated timestamp needs to be indexed and stored somewhere. Two options:

  1. Update the _source field itself, replacing the null or date maths with the resolved timestamp
  2. Leave the _source as it is, and store the resolved timestamp in doc values or as a stored field.

The advantage of the first approach is that you automatically retrieve the resolved value without having to request any specific fields. However, this is the only place where we would change the original _source field before storing it. Is this a problem?

@kimchy thoughts on altering _source?

My preference would be altering the _source, but the doc values or stored fields approach would work as well.

Replace _ttl with _expiration

While we could support the _ttl field and mapping for bwc, the new meta field should be _expiration, and its mapping would look like this:

  "_expiration": {
    "enabled":        true,
    "required":       true,            # optional
    "default":        "10m",           # optional
    "format":         "dateTime",      # optional
    "timestamp_path": "last_modified"  # optional
  }

The _expiration parameter can be passed as:

  • an absolute value, eg 1425671241123, or 2014-01-01 23:15:16 (interpreted by format)
  • date math, eg now+10m
  • relative times, eg 10m, in which case it is added to the value in the timestamp_path, if specified, otherwise to now
  • if not specified, then the default value will be used or, if no default is set (or _expiration is manually specified as 0) then the document will not expire.

Moving to plugin

The expiration of documents based on TTL feels sufficiently removed from the core that I think this functionality should be moved to a plugin. This would also allow the addition of increased checks (eg not returning expired documents in a search request, as happens today) without impacting performance for the vast majority of users who don't use the expiry functionality.

@polyfractal
Copy link
Contributor

Related discussion from a while back: #7732

Co-opting null_value into a "default" could make it tricky to differentiate "nullness" from "missing" from "default" at query time.

If null_value gets dynamic capabilities, I think it'd be important to have the equivalent for missing_value, so the user can fine-tune how and when values are generated (and therefore know what to search for at query time)

@eliwjones
Copy link

I vote for being able to have generated datetime values inside the doc['_source']

One should have the ability to simulate created_at and updated_at values.

I don't want to have change or specify anything from the application.

Not sure what the sanest way would be to specify it.. something like:

"created_at": {
  "type": "date",
  "format": "YYYY-MM-dd",
  "null_value": "now"
},
"updated_at": {
  "type": "date",
  "format": "YYYY-MM-dd",
  "value": "now"
}

seems inexact to say "value" : "now"...

@jpountz
Copy link
Contributor

jpountz commented Jul 31, 2015

We discussed this issue in fixit friday and said that regardless of the way we plan to refactor _ttl, moving the functionality to a plugin would be great.

@jpountz jpountz added help wanted adoptme and removed discuss labels Jul 31, 2015
@jpountz
Copy link
Contributor

jpountz commented Jul 31, 2015

Making it an adoptme: let's move it to a plugin first and then think about how to improve it.

@jpountz
Copy link
Contributor

jpountz commented Jul 31, 2015

Actually this looks challenging: _timestamp and _ttl do not only introduce new mappers, but also special document parsing logic, so that you can provide their values in the index URL, etc. So moving the feature to a plugin looks hard to me if we want to stay backward compatible.

@clintongormley
Copy link
Contributor

Closing in favour of #18280

@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

7 participants