Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for the data lifecycle to be explicitly nullified #95979

Merged
merged 38 commits into from
Jun 1, 2023

Conversation

gmarouli
Copy link
Contributor

@gmarouli gmarouli commented May 10, 2023

In this PR we allow a template to have a lifecycle set to null. This can enable a user to explicitly signal that they do not want to use a lifecycle in a composable template even if the component templates have lifecycle defined.

In other words, we now support the following:

"template": {
   "lifecycle": null
}

The above works in the following way:

  • When used in a component template, then it's the equivalent of a missing lifecycle.
  • When used in an index template, then it means that the user does not want any lifecycle defined in the data streams created by this template.

For example:

# Add component template with lifecycle
PUT _component_template/my-template
{
  "template": {
    "settings": {
      "number_of_shards": 1
    },
    "lifecycle": {
      "data_retention": "10d"
    }
  }
}

# Simulate index template with the null lifecycle
POST /_index_template/_simulate?include_defaults
{
  "index_patterns": ["my-data-*"],
  "composed_of": ["my-template"],
  "priority": 10,
  "template": {
    "lifecycle": null
  },
  "data_stream": {
  }
}

# Response
{
  "template": {
    "settings": {
      "index": {
        "number_of_shards": "1",
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_hot"
            }
          }
        }
      }
    },
    "aliases": {},
    "lifecycle": null
  },
  "overlapping": []
}

Part of: #93596

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label May 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

@gmarouli
Copy link
Contributor Author

Bumping the compatibility version of the failed tests. Since DLM is still experimental, I think we do not need to handle backwards compatibility.

@dakrone
Copy link
Member

dakrone commented May 10, 2023

When used in a component template, then it's the equivalent of a missing lifecycle.

How would a user specify from a component template that the data stream should be unmanaged by DLM?

@gmarouli gmarouli marked this pull request as draft May 11, 2023 07:58
@gmarouli
Copy link
Contributor Author

gmarouli commented May 11, 2023

Converted it to draft to add support for data_retention: null

@gmarouli
Copy link
Contributor Author

Bumping the compatibility version of the failed tests. Since DLM is still experimental, I think we do not need to handle backwards compatibility.

When used in a component template, then it's the equivalent of a missing lifecycle.

How would a user specify from a component template that the data stream should be unmanaged by DLM?

I am also doubting that part to be honest. Right now it is not possible. A component template might not have lifecycle defined which means that it is not managed by DLM but it can never override another one that has. Only the index template can.

Should we change it and treat the same way? @dakrone is that what you had in mind?

@gmarouli gmarouli mentioned this pull request May 11, 2023
19 tasks
@dakrone
Copy link
Member

dakrone commented May 15, 2023

How would a user specify from a component template that the data stream should be unmanaged by DLM?

I am also doubting that part to be honest. Right now it is not possible. A component template might not have lifecycle defined which means that it is not managed by DLM but it can never override another one that has. Only the index template can.

Should we change it and treat the same way? @dakrone is that what you had in mind?

It's a little hard to reason about with prose, maybe a table will help us decide (the values in the table are the lifecycle config, assuming that A is before B in the composed_of list):

Component A Component B Composable Z Final result
"lifecycle": {} "lifecycle": {}
"lifecycle": {} "lifecycle": {"retention": "30d"} "lifecycle": {"retention": "30d"}
"lifecycle": {"retention": "30d"} "lifecycle": {"retention": null} "lifecycle": {}
"lifecycle": {"retention": "30d"} "lifecycle": null "lifecycle": {}
"lifecycle": {"retention": "30d"} "lifecycle": {"retention": "45d"} "lifecycle": {} "lifecycle": {}
"lifecycle": {"retention": "30d"} "lifecycle": {"retention": "45d"} "lifecycle": null
"lifecycle": {} "lifecycle": {"retention": "45d"} "lifecycle": {"retention": "15d"} "lifecycle": {"retention": "15d"}
"lifecycle": {} "lifecycle": {"retention": "45d"} "lifecycle": {"retention": null} "lifecycle": {}

Hopefully that helps. Also, this is just some thoughts for discussion, not necessarily the way we have to decide. Either way, I think having the table is helpful to make sure we have the same thing in mind. What do you think?

@gmarouli
Copy link
Contributor Author

@dakrone thank you! This is indeed really helpful. I am going to make a table of the current composition in which explicit null values are not supported and then we can see how would this change when we introduce them:

Current behavior:
We keep the latest configuration, missing values represent that we have no opinion, so we "inherit" what the previous template has dictated.

Table 1

Component A Component B Composable Z Final result
"lifecycle": {} "lifecycle": {}
"lifecycle": {} "lifecycle": {"retention": "30d"} "lifecycle": {"retention": "30d"}
"lifecycle": {"retention": "30d"} "lifecycle": {"retention": "45d"} "lifecycle": {} "lifecycle": {"retention": "45d"}
"lifecycle": {} "lifecycle": {"retention": "45d"} "lifecycle": {"retention": "15d"} "lifecycle": {"retention": "15d"}

Proposal # 1:
We treat a lifecycle template in the same way no matter if it is defined in a component template or in an index template.

  • a missing value represents no opinion, so we inherit the previous templates' opinions
  • a null value means we explicitly do not want this set.

This means we extend the table above with the following cases:

Table 2a

Component A Component B Composable Z Final result
"lifecycle": {"retention": "30d"} "lifecycle": {"retention": null} "lifecycle": {}
"lifecycle": {} "lifecycle": {"retention": "45d"} "lifecycle": {"retention": null} "lifecycle": {}
"lifecycle": {"retention": "30d"} "lifecycle": {"retention": "45d"} "lifecycle": null
"lifecycle": {"retention": "30d"} "lifecycle": null
"lifecycle": {"retention": "30d"} "lifecycle": null "lifecycle": {"retention": "15d"} "lifecycle": {"retention": "15d"}

Proposal # 2:
We treat lifecycle: null differently depending if we are a component template or an index template.

  • a missing always value represents no opinion, so we inherit the previous templates' opinions
  • a null value in component template represents no opinion, so we inherit the previous templates' opinions
  • a null value in an index template represents that we explicitly do not want this data stream managed.

This means we extend the table above with the following cases:

Table 2b

Component A Component B Composable Z Final result
"lifecycle": {"retention": "30d"} "lifecycle": {"retention": null} "lifecycle": {}
"lifecycle": {} "lifecycle": {"retention": "45d"} "lifecycle": {"retention": null} "lifecycle": {}
"lifecycle": {"retention": "30d"} "lifecycle": {"retention": "45d"} "lifecycle": null
"lifecycle": {"retention": "30d"} "lifecycle": null "lifecycle": {}
"lifecycle": {"retention": "30d"} "lifecycle": null "lifecycle": {"retention": "15d"} "lifecycle": {"retention": "15d"}

My recommendation:
Strictly driven from the explanation of the behavior and not from the way it might be used, I favor proposal # 1 because the null values behave consistently across lifecycle, data_retention irrespective of where it's used (component template or index template). I believe it's easier to comprehend that null means ignore everything else we have seen so far.

@dakrone & @masseyke what do you think? Does this way of thinking stand?

@dakrone
Copy link
Member

dakrone commented May 16, 2023

Thanks for presenting the proposals like this Mary — I favor proposal # 1 also. I think consistency is good to have no matter what kind of template.

@masseyke
Copy link
Member

Your Table 1 has some conditions that aren't in Table 2a or 2b. I assume since null is not involved in those, the behavior would stay the same, right?
I prefer proposal 1 over proposal 2 as well. I think I'm confused by the last row in table 2a though (I probably ought to look at this in the morning instead of in the evening) -- Component B explicitly makes the lifecycle null, so doesn't that mean the final result would be null?

@gmarouli
Copy link
Contributor Author

Your Table 1 has some conditions that aren't in Table 2a or 2b. I assume since null is not involved in those, the behavior would stay the same, right?

Exactly, table 1 remains the same between the two proposals and depicts how composition works right now that we do not allow the null value.

I prefer proposal 1 over proposal 2 as well. I think I'm confused by the last row in table 2a though (I probably ought to look at this in the morning instead of in the evening) -- Component B explicitly makes the lifecycle null, so doesn't that mean the final result would be null?

No worries, I can explain, there is precendence between the component template and the index template, so it works like this:

  1. Component template A: We start with this template
  2. Component template B: We "compose" it with A, which means add new fields or overwrite existing ones, in this case B overwrites the lifecycle defined in A and would render the data stream unmanaged if it wasn't for
  3. Index template Z: we take the template from this one and we "compose" it with the result from A&B. Since that was a null lifecycle, there is nothing to merge and we overwrite it by the lifecycle defined in this template.

Does this make sense? If we do not do that any lifecycle:null in the list would render a data stream unmanaged which I think is too aggressive as an approach.

@masseyke
Copy link
Member

here is precendence between the component template and the index template

Oh OK -- I had thought that in proposal 1 you were removing that precedence.

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli gmarouli requested a review from andreidan May 29, 2023 19:07
@gmarouli
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this Mary. This is shaping up nicely.

Left a few more comments

@gmarouli gmarouli requested a review from andreidan May 30, 2023 11:50
@gmarouli
Copy link
Contributor Author

Waiting for: #96428

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this Mary.

Left a final batch of comments and this is almost ready 🚀

@@ -180,7 +206,15 @@ public void writeTo(StreamOutput out) throws IOException {
out.writeMap(this.aliases, StreamOutput::writeString, (stream, aliasMetadata) -> aliasMetadata.writeTo(stream));
}
if (out.getTransportVersion().onOrAfter(TransportVersion.V_8_8_0) && DataLifecycle.isEnabled()) {
out.writeOptionalWriteable(lifecycle);
if (out.getTransportVersion().onOrAfter(TransportVersion.V_8_500_007)) {
boolean isExplicitNull = NO_LIFECYCLE.equals(lifecycle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use equals here but ==

Should we also add a unit test that fails if equals is used here?

@@ -1508,7 +1508,7 @@ public static DataLifecycle resolveLifecycle(ComposableIndexTemplate template, M
public static DataLifecycle composeDataLifecycles(List<DataLifecycle> lifecycles) {
DataLifecycle.Builder builder = null;
for (DataLifecycle current : lifecycles) {
if (current.equals(Template.NO_LIFECYCLE)) {
if (current == Template.NO_LIFECYCLE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test that fails if equals is used here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe better just implement equals as follows:

        @Override
        public boolean equals(Object o) {
            return this == o;
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to throw an exception is someone is using equals

No, just a unit test that uses a DataLifecycle that is equals to Template.NO_LIFECYCLE (e.g the INFINITE_RETENTION one) and we compose it using composeDataLifecycles and assert the correct result

This test would fail if we change the code to

if (current.equals(Template.NO_LIFECYCLE)) {

Hope that makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does, let me try

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I am back, because both equals and == behave the same in this case, I cannot really make a unit test fail, unless you had in mind to mock DataLifecycle to ensure that equals is not called, but I am not sure we should go that far. If I am not mistaken, it's nicer to have == but since both are correct, I don't think we should go to such lengths. What do you think?

PS: the equals fails because they are not the same class type.

Copy link
Contributor

@andreidan andreidan Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the equals fails because they are not the same class type

Ah, of course - the method override at instantiation time gives us this.
If we remove that override and somehow end up using equals the testLifecycleComposition test will fail (using == in the meantime makes us resilient to the method override removal)

I'm good with this 🚀

@gmarouli gmarouli requested a review from andreidan June 1, 2023 13:57
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for implementing this

@gmarouli
Copy link
Contributor Author

gmarouli commented Jun 1, 2023

@elasticmachine update branch

@gmarouli gmarouli merged commit 8363e8c into elastic:main Jun 1, 2023
dakrone added a commit to dakrone/elasticsearch-specification that referenced this pull request Jun 27, 2023
These keys allow using an explicit `null` as the value to "unset" the configuration when merging multiple component templates. As related to the merging tables seen in elastic/elasticsearch#95979 (comment)

Relates also to the discussion in elastic#2049
dakrone added a commit to dakrone/elasticsearch-specification that referenced this pull request Jun 27, 2023
These keys allow using an explicit `null` as the value to "unset" the configuration when merging multiple component templates. As related to the merging tables seen in elastic/elasticsearch#95979 (comment)

Relates also to the discussion in elastic#2049
@gmarouli gmarouli deleted the dlm-nullify-retention branch August 20, 2024 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Team:Data Management Meta label for data/management team v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants