Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for data streams in ES Archiver #69061

Closed
jonathan-buttner opened this issue Jun 12, 2020 · 6 comments · Fixed by #132853
Closed

Add support for data streams in ES Archiver #69061

jonathan-buttner opened this issue Jun 12, 2020 · 6 comments · Fixed by #132853
Assignees
Labels
impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. Team:Operations Team label for Operations Team triage_needed

Comments

@jonathan-buttner
Copy link
Contributor

Describe the feature

It would be useful for testing to have support for data streams in ES Archiver. Data streams causes a couple of issues with ES Archiver's current implementation.

Append only writes

Data streams require append only writes which must be done using the create operation when inserting a document into a data stream. A temporary fix was added in this PR: #68794

If you try to add a document without create you will get this error:

"name": "ResponseError",
  "meta": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "The provided expression [events-endpoint-1] matches a data stream, specify the corresponding concrete indices instead."
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "The provided expression [events-endpoint-1] matches a data stream, specify the corresponding concrete indices instead."
      },
      "status": 400
    },
    "statusCode": 400,

Another option is to insert document directly into the backing indices of a data stream. But doing this doesn't actually create the data stream.

Data stream specific apis

ES Archiver needs to use new apis to create and delete a data stream: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/data-stream-apis.html

and the templates used for a data stream need to include specific fields.

Describe a specific use case for the feature

The endpoint team leverages data streams for ingesting endpoint data. Having full support for creating/deleting data streams would aid our development of the endpoint app.

Implementation details

Here are some implementation details from a conversation with @spalger

Slack Conversation

When we create an archive we'll probably get a string in the "indices" list passed to node scripts/es_archiver save {name} {indices} and from there we could identify that the "index" is actually a data stream

From there we should be able to write a record to mappings.json of type "data_stream" so that we know what to clear before we write data

Attach a property to doc records written to data.json files that indicates the document should be passed to a specific data stream rather than indexed like a normal doc

@streamich streamich added Team:Operations Team label for Operations Team triage_needed labels Jun 15, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@joshdover
Copy link
Contributor

This becoming more of an issue as we start to add deeper integrations between the Fleet UI and data ingested by Elastic Agent, which exclusively uses data streams. We'd like to easily be able to write tests against data like this and es archiver would be a big help.

Is there any movement on prioritization of this issue? If not, would the team be open to pull requests from the Fleet team?

@tylersmalley tylersmalley removed loe:small Small Level of Effort impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. EnableJiraSync labels Mar 16, 2022
@spalger
Copy link
Contributor

spalger commented Mar 16, 2022

Definitely open to helping get someone from the fleet team started here. It should be pretty easy to implement just have a lot in progress right now.

@exalate-issue-sync exalate-issue-sync bot added the impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. label Mar 22, 2022
@jasonrhodes
Copy link
Member

@joshdover do you have someone you could divert to the initial work here? @smith this will likely become more and more important for us as well, so it might be nice to donate a few engineer hours to help test/review any changes that come out of this. We could possibly pick up the work, but it may not be for a little while if we go that route.

@mitodrummer
Copy link
Contributor

Just ran into a need for this, so I can debug some workload specific data from the logs-endpoint.process datastream in a cloud deployment on my local es. ++ 👍

@klacabane
Copy link
Contributor

Worked on stack monitoring integration tests and took the opportunity to add ds support #132853

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. Team:Operations Team label for Operations Team triage_needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants