Add support for data streams in ES Archiver #69061

jonathan-buttner · 2020-06-12T20:09:54Z

Describe the feature

It would be useful for testing to have support for data streams in ES Archiver. Data streams causes a couple of issues with ES Archiver's current implementation.

Append only writes

Data streams require append only writes which must be done using the create operation when inserting a document into a data stream. A temporary fix was added in this PR: #68794

If you try to add a document without create you will get this error:

"name": "ResponseError",
  "meta": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "The provided expression [events-endpoint-1] matches a data stream, specify the corresponding concrete indices instead."
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "The provided expression [events-endpoint-1] matches a data stream, specify the corresponding concrete indices instead."
      },
      "status": 400
    },
    "statusCode": 400,

Another option is to insert document directly into the backing indices of a data stream. But doing this doesn't actually create the data stream.

Data stream specific apis

ES Archiver needs to use new apis to create and delete a data stream: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/data-stream-apis.html

and the templates used for a data stream need to include specific fields.

Describe a specific use case for the feature

The endpoint team leverages data streams for ingesting endpoint data. Having full support for creating/deleting data streams would aid our development of the endpoint app.

Implementation details

Here are some implementation details from a conversation with @spalger

Slack Conversation

When we create an archive we'll probably get a string in the "indices" list passed to node scripts/es_archiver save {name} {indices} and from there we could identify that the "index" is actually a data stream

From there we should be able to write a record to mappings.json of type "data_stream" so that we know what to clear before we write data

Attach a property to doc records written to data.json files that indicates the document should be passed to a specific data stream rather than indexed like a normal doc

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-06-15T12:20:05Z

Pinging @elastic/kibana-operations (Team:Operations)

joshdover · 2022-03-15T10:51:51Z

This becoming more of an issue as we start to add deeper integrations between the Fleet UI and data ingested by Elastic Agent, which exclusively uses data streams. We'd like to easily be able to write tests against data like this and es archiver would be a big help.

Is there any movement on prioritization of this issue? If not, would the team be open to pull requests from the Fleet team?

spalger · 2022-03-16T15:06:27Z

Definitely open to helping get someone from the fleet team started here. It should be pretty easy to implement just have a lot in progress right now.

jasonrhodes · 2022-04-18T17:55:17Z

@joshdover do you have someone you could divert to the initial work here? @smith this will likely become more and more important for us as well, so it might be nice to donate a few engineer hours to help test/review any changes that come out of this. We could possibly pick up the work, but it may not be for a little while if we go that route.

mitodrummer · 2022-04-25T19:40:07Z

Just ran into a need for this, so I can debug some workload specific data from the logs-endpoint.process datastream in a cloud deployment on my local es. ++ 👍

klacabane · 2022-05-29T20:39:38Z

Worked on stack monitoring integration tests and took the opportunity to add ds support #132853

streamich added Team:Operations Team label for Operations Team triage_needed labels Jun 15, 2020

spalger mentioned this issue Jul 1, 2020

[ES Snapshots Failure] Endpoint tests unable to create data stream from ES archive #70535

Closed

michalpristas mentioned this issue Aug 4, 2020

[Elastic Agent] Agent datastreams are conflicting with Filebeat setup elastic/beats#19369

Closed

jbudz mentioned this issue Jun 15, 2021

[esArchiver] support archiving of data streams #98366

Closed

tylersmalley added 1 and removed 1 labels Oct 11, 2021

exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Oct 12, 2021

tylersmalley added the EnableJiraSync label Oct 14, 2021

neptunian mentioned this issue Nov 24, 2021

[Stack Monitoring] Testing strategy for agent/integration data #119658

Closed

miltonhultgren mentioned this issue Nov 29, 2021

[Infrastructure Monitoring] Better data generation #119491

Closed

1 task

nchaulet mentioned this issue Mar 14, 2022

[Fleet] Add agent incoming data endpoint and presentational component #127177

Merged

9 tasks

tylersmalley removed loe:small Small Level of Effort impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. EnableJiraSync labels Mar 16, 2022

exalate-issue-sync bot added the impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. label Mar 22, 2022

joshdover assigned klacabane Jun 1, 2022

joshdover mentioned this issue Jun 1, 2022

esArchiver datastream support #132853

Merged

klacabane closed this as completed in #132853 Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for data streams in ES Archiver #69061

Add support for data streams in ES Archiver #69061

jonathan-buttner commented Jun 12, 2020

elasticmachine commented Jun 15, 2020

joshdover commented Mar 15, 2022

spalger commented Mar 16, 2022

jasonrhodes commented Apr 18, 2022

mitodrummer commented Apr 25, 2022

klacabane commented May 29, 2022

Add support for data streams in ES Archiver #69061

Add support for data streams in ES Archiver #69061

Comments

jonathan-buttner commented Jun 12, 2020

Describe the feature

Append only writes

Data stream specific apis

Describe a specific use case for the feature

Implementation details

elasticmachine commented Jun 15, 2020

joshdover commented Mar 15, 2022

spalger commented Mar 16, 2022

jasonrhodes commented Apr 18, 2022

mitodrummer commented Apr 25, 2022

klacabane commented May 29, 2022