Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce data stream options and failure store configuration classes #109515

Merged
merged 22 commits into from
Sep 11, 2024

Conversation

gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Jun 10, 2024

In order to facilitate enabling and disabling the failure store & component template composition, we introduce new metadata classes that can support a more extensible failure store configuration.

We would like to introduce data stream options. Data stream options capture the configuration of data stream level (smaller and larger) features, such the failure store and in the future data stream lifecycle. They are different than settings because they are applied on a data stream level and not per backing index.

This PR is only setting the basic classes to enable follow up PRs that will actually use them.

Examples, these are not final, they are only used to help visualise a potential direction:

GET _data_stream/my-*/_options
{
  "data_streams": [
    {
      "name": "my-non-opinionated-ds",
      "options": {
      }
    },
    {
      "name": "my-fs",
      "options": {
        "failure_store": {
          "enabled": true
        }
      }
    },
    {
      "name": "my-no-fs",
      "options": {
        "failure_store": {
          "enabled": false
        }
      }
    }
  ]
}

// If we decide to add lifecycle here too:
PUT _data_stream/my-fs/_options
{
   "failure_store": {
     "enabled": true
   },
   "lifecycle": {
   }
}

What we see above are 3 data streams:

  • my-fs with the failure store explicitly enabled
  • my-no-fs with the failure store explicitly disabled, and
  • my-non-opinionated-ds which does not specify what to do with the failure store, so for now it means failure store disabled but that could change in the future.

Template composition examples pending

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jun 10, 2024
Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one minor typo suggestion, other than that it LGTM :).
I do think it makes sense to wait with merging this, because the DataStreamOptions implementation might differ a bit depending on the ongoing discussions we're having. What do you think?

@gmarouli gmarouli requested review from dakrone and nielsbauman June 10, 2024 12:52
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments. If we decide we want to include lifecycle options (which I would like to, but I know it's up for discussion), then it'd be interesting to see that here, even if it's not used yet.

@gmarouli
Copy link
Contributor Author

I left one minor typo suggestion, other than that it LGTM :). I do think it makes sense to wait with merging this, because the DataStreamOptions implementation might differ a bit depending on the ongoing discussions we're having. What do you think?

Yes, definitely, I am open to change it. This is supposed to help identify red flags by having something more concrete. With this I can at least keep preparing for the API even if a few things could change, without feeling I am going on a wrong direction.

@gmarouli
Copy link
Contributor Author

I left a couple of comments. If we decide we want to include lifecycle options (which I would like to, but I know it's up for discussion), then it'd be interesting to see that here, even if it's not used yet.

Happy to include it here just for demonstration purposes. I can always revert the commit if we think it doesn't fit anymore.

@gmarouli gmarouli requested a review from dakrone June 11, 2024 07:01
@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

@dakrone & @jbaiera could you give this another look please. Thank you ☺️ .

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left one comment

Comment on lines 47 to 49
public static DataStreamFailureStore read(StreamInput in) throws IOException {
return new DataStreamFailureStore(in.readBoolean());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you went with a static method rather than a constructor? We've been moving to constructors for this more and more in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I wasn't aware we prefer constructors. I preferred the static method because it allows more flexibility, so I thought it was better to always use a static method to facilitate all cases.

However, if this is not the direction we are following and in this case there is no blocker in using a constructor, I will change it to use a constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally also more of a fan of static methods (because in some cases we need to use a static method rather than a constructor).

@gmarouli
Copy link
Contributor Author

gmarouli commented Sep 4, 2024

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

gmarouli commented Sep 4, 2024

@jbaiera if you want to see how it looks with the lifecycle as well, check the diff in a26bb6d (the revert commit)

* supports the following configurations:
* - enabled
*/
public record DataStreamFailureStore(boolean enabled) implements SimpleDiffable<DataStreamFailureStore>, ToXContentObject {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things we're talking about doing is adding a cluster setting that will contain index patterns which will state that failure store is enabled for any matching indices by default. The plan is to fall back to using that cluster setting in the case that failure stores haven't been explicitly enabled or disabled via templates at creation time, or in the state.

To save on headache going forward, should this boolean be an optional Boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! thinking further about this, I would expect that the following to be the not explicitly set:

  • Explicitly enabled:
PUT _data_stream/my-fs/_options
{
   "failure_store": {
     "enabled": true
   }
}
  • Explicitly disabled:
PUT _data_stream/my-fs/_options
{
   "failure_store": {
     "enabled": false
   }
}
  • No explicit configuration:
PUT _data_stream/my-fs/_options
{}

This follows the way that the lifecycle works. In template composition, a user can set failure_store: null to remove any configuration set by previous templates.

Since this is a new configuration type we could always decide we would like to change the way we work with this, but I thought we should at least consider it.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, since there is only one property we can just assume the entire section will be removed from options if it's not present. I'm good with that

@gmarouli gmarouli requested a review from jbaiera September 9, 2024 11:02
@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

* supports the following configurations:
* - enabled
*/
public record DataStreamFailureStore(boolean enabled) implements SimpleDiffable<DataStreamFailureStore>, ToXContentObject {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, since there is only one property we can just assume the entire section will be removed from options if it's not present. I'm good with that

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli gmarouli added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 11, 2024
@elasticsearchmachine elasticsearchmachine merged commit 077b585 into elastic:main Sep 11, 2024
15 checks passed
@gmarouli gmarouli deleted the add-data-stream-options branch September 11, 2024 14:46
v1v added a commit to v1v/elasticsearch that referenced this pull request Sep 12, 2024
…tion-ironbank-ubi

* upstream/main: (302 commits)
  Deduplicate BucketOrder when deserializing (elastic#112707)
  Introduce test utils for ingest pipelines (elastic#112733)
  [Test] Account for auto-repairing for shard gen file (elastic#112778)
  Do not throw in task enqueued by CancellableRunner (elastic#112780)
  Mute org.elasticsearch.script.StatsSummaryTests testEqualsAndHashCode elastic#112439
  Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testTransportException elastic#112779
  Use a dedicated test executor in MockTransportService (elastic#112748)
  Estimate segment field usages (elastic#112760)
  (Doc+) Inference Pipeline ignores Mapping Analyzers (elastic#112522)
  Fix verifyVersions task (elastic#112765)
  (Doc+) Terminating Exit Codes (elastic#112530)
  (Doc+) CAT Nodes default columns (elastic#112715)
  [DOCS] Augment installation warnings (elastic#112756)
  Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testCorruption elastic#112769
  Bump Elasticsearch to a minimum of JDK 21 (elastic#112252)
  ESQL: Compute support for filtering ungrouped aggs (elastic#112717)
  Bump Elasticsearch version to 9.0.0 (elastic#112570)
  add CDR related data streams to kibana_system priviliges (elastic#112655)
  Support widening of numeric types in union-types (elastic#112610)
  Introduce data stream options and failure store configuration classes (elastic#109515)
  ...
davidkyle pushed a commit that referenced this pull request Sep 12, 2024
…#109515)

In order to facilitate enabling and disabling the failure store &
component template composition, we introduce new metadata classes that
can support a more extensible failure store configuration.

We would like to introduce **data stream options**. Data stream options
capture the configuration of data stream level (smaller and larger)
features, such the failure store and in the future data stream
lifecycle. They are different than settings because they are applied on
a data stream level and not per backing index.

This PR is only setting the basic classes to enable follow up PRs that
will actually use them. 

Examples, these are not final, they are only used to help visualise a
potential direction:

```
GET _data_stream/my-*/_options
{
  "data_streams": [
    {
      "name": "my-non-opinionated-ds",
      "options": {
      }
    },
    {
      "name": "my-fs",
      "options": {
        "failure_store": {
          "enabled": true
        }
      }
    },
    {
      "name": "my-no-fs",
      "options": {
        "failure_store": {
          "enabled": false
        }
      }
    }
  ]
}

// If we decide to add lifecycle here too:
PUT _data_stream/my-fs/_options
{
   "failure_store": {
     "enabled": true
   },
   "lifecycle": {
   }
}
```

What we see above are 3 data streams:

- `my-fs` with the failure store explicitly enabled
- `my-no-fs` with the failure store explicitly disabled, and
- `my-non-opinionated-ds` which does not specify what to do with the failure store, so for now it means failure store disabled but that could change in the future.

Template composition examples pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Data Management/Data streams Data streams and their lifecycles >non-issue Team:Data Management Meta label for data/management team v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants