Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce downsampling configuration for data stream lifecycle #97041

Merged

Conversation

gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Jun 23, 2023

This change adds support for downsampling lifecycle config like the following:

{
  "lifecycle": {
    "data_retention": "90d",
    "downsampling": [
      {
        "after": "1d",
        "fixed_interval": "2h"
      },
      { "after": "15d", "fixed_interval": "1d" },
      { "after": "30d", "fixed_interval": "1w" }
    ]
  }
}

We will also support saying that explicitly do not want any downsampling (for templates):

{
  "lifecycle": {
    "data_retention": "90d",
    "downsampling": null
  }
}

And that we do not have a preference for downsampling:

{
  "lifecycle": {
    "data_retention": "90d"
  }
}

Disclaimer
This proposal is an alternative implementation to #96848.

We propose to not add new functionality to AbstractObjectParser.java because our understanding is that the nullification needed for template composition is not something we want to have in general. We chose to expose the parsing of an array value, so this special array parsing can be contained within the DataLifecycle code.

Part of: #93596

@gmarouli gmarouli requested a review from andreidan June 23, 2023 09:36
@gmarouli gmarouli added the :Data Management/Data streams Data streams and their lifecycles label Jun 23, 2023
@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

Comment on lines +75 to +81
PARSER.declareField(ConstructingObjectParser.optionalConstructorArg(), (p, c) -> {
if (p.currentToken() == XContentParser.Token.VALUE_NULL) {
return Downsampling.NULL;
} else {
return new Downsampling(AbstractObjectParser.parseArray(p, c, Downsampling.Round::fromXContent));
}
}, DOWNSAMPLING_FIELD, ObjectParser.ValueType.OBJECT_ARRAY_OR_NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach LGTM, thanks Mary
// cc @masseyke

@gmarouli gmarouli requested review from andreidan and masseyke June 26, 2023 20:02
@gmarouli gmarouli marked this pull request as ready for review June 26, 2023 20:02
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jun 26, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@gmarouli gmarouli changed the title Alternative draft proposal for adding downsampling configuration Introduce downsampling configuration for data stream lifecycle Jun 26, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Mary, this looks great

Left a couple of comments

* @param after is a TimeValue configuring how old (based on generation age) should a backing index be before downsampling
* @param fixedInterval is a TimeValue configuring the interval that the backing index is going to be downsampled.
*/
public record Round(TimeValue after, TimeValue fixedInterval) implements Writeable, ToXContentObject {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the existing DownsampleConfig instead of just a TimeValue for the fixedInterval?

We'll later be able to use it in the DownsampleAction#Request and implement the intervals validations similar to how we do it in https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/TransportDownsampleAction.java#L527C43-L527C43

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, I was so focused on how to parse it I completely forgot to check for existing code. Thanks @andreidan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to make some adjustments to the DownsamplingConfig to use it in DataLifecycle but I believe that they are not too invasive.

+ "."
);
}
if (round.fixedInterval.compareTo(previous.fixedInterval) < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also introduce the validation that all fixed_intervals are required to be multiples of each other?

(This could be a follow-up as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give it a try.

Copy link
Contributor Author

@gmarouli gmarouli Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extrcted this validation so now it's shared.

@gmarouli gmarouli requested a review from andreidan June 29, 2023 11:03
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is great, thanks Mary

@gmarouli
Copy link
Contributor Author

gmarouli commented Jun 29, 2023

Thank you for the quick review and the sharp comments! 🚀

@gmarouli gmarouli merged commit f87c2c7 into elastic:main Jun 29, 2023
@gmarouli gmarouli deleted the data-stream-lifecycle-downsampling-config branch August 20, 2024 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >feature Team:Data Management Meta label for data/management team v8.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants