Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esArchiver datastream support #132853

Merged
merged 20 commits into from
Jun 2, 2022

Conversation

klacabane
Copy link
Contributor

@klacabane klacabane commented May 24, 2022

Summary

Fixes #69061

Adds support for archiving/loading/unloading data streams.

When archiving indices we now verify whether we have a backing index of a data stream. When that's the case we build the data stream's backing index template resolving any component templates link, and save it as a data_stream record type in the mappings.json file:

{
  "type": "data_stream",
  "value": {
    "data_stream": "my-data-stream-one",
    "template": {
      "index_patterns": ["my-data-stream-*"],
      "mappings": { ... },
      "settings": { ... },
      ...
    }
  }
}

While multiple backing indices of the same data stream can be returned we'll only output a single entry containing the latest mappings and settings.

The documents associated with a data stream keep the same doc type but have an additional data_stream property used at load time to index it to the appropriate target (we can't directly write to a backing indice) and to pick the correct BULK operation (data streams can only use create).

Note:
The index template could have an ILM policy that we currently don't save or create when loading the archive. We can add it if they are uses case that could benefit from that

Testing

  • the new data streams paths are unit tested
  • functional test suites are green
  • archived local and cloud data streams

Manual steps

  • Create a data stream or target an existing one (see setup a data stream)
  • Archive it
    node scripts/es_archiver.js save ~/my-data-stream my-data-stream --es-url=http://elastic:changeme@localhost:9200 --kibana-url=http://elastic:changeme@localhost:5601/pat
  • Load the archive (while loading removes existing resources we should test against a clean cluster)
    node scripts/es_archiver.js load ~/my-data-stream --es-url=http://elastic:changeme@localhost:9200 --kibana-url=http://elastic:changeme@localhost:5601/pat
  • Inspect the loaded data stream and template
  • Unload the archive
    node scripts/es_archiver.js load ~/my-data-stream --es-url=http://elastic:changeme@localhost:9200 --kibana-url=http://elastic:changeme@localhost:5601/pat
  • Verify the resources are gone

@klacabane klacabane changed the title aliases fallback esArchiver datastream support May 25, 2022
@klacabane
Copy link
Contributor Author

@elasticmachine merge upstream

@klacabane klacabane added Team:Operations Team label for Operations Team v8.3.0 labels May 29, 2022
@klacabane klacabane self-assigned this May 29, 2022
@klacabane klacabane marked this pull request as ready for review May 29, 2022 18:54
@klacabane klacabane requested a review from a team as a code owner May 29, 2022 18:54
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

Copy link
Contributor

@spalger spalger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you so much for getting this in!

Comment on lines 78 to +82
// if keepIndexNames is false, rewrite the .kibana_* index to .kibana_1 so that
// when it is loaded it can skip migration, if possible
index:
hit._index.startsWith('.kibana') && !keepIndexNames ? '.kibana_1' : hit._index,
data_stream: dataStream,
Copy link
Contributor

@spalger spalger Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Part of me would prefer that docs either had an index or a data_stream, but I'm not opposed to keeping the index if there's some use for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly kept it for traceability when debugging or inspecting archived data, besides that there's no real use for it :)

@klacabane
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @klacabane

@klacabane klacabane merged commit 4c4f0f5 into elastic:main Jun 2, 2022
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 132853 locally

@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jun 6, 2022
@spalger spalger added v8.4.0 and removed v8.3.0 labels Jun 6, 2022
@kibanamachine kibanamachine added backport:skip This commit does not require backporting and removed backport missing Added to PRs automatically when the are determined to be missing a backport. labels Jun 6, 2022
@klacabane klacabane added auto-backport Deprecated - use backport:version if exact versions are needed v8.3.1 and removed backport:skip This commit does not require backporting labels Jun 24, 2022
kibanamachine pushed a commit that referenced this pull request Jun 24, 2022
* aliases fallback

* nasty datastream support implementation

* datastreams stats method

* update filter stream

* datastream support for unload action

* create-index datastream support

* index records data stream support

* doc records data streams support

* [CI] Auto-commit changed files from 'node scripts/eslint --no-cache --fix'

* lint

* pull composable templates

* set data_stream as a separate property on documents

* force create bulk operation when datastream record

* [CI] Auto-commit changed files from 'node scripts/eslint --no-cache --fix'

* lint

* getIndexTemplate tests

* [CI] Auto-commit changed files from 'node scripts/precommit_hook.js --ref HEAD~1..HEAD --fix'

* share cache across transform executions

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 4c4f0f5)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.3

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Jun 24, 2022
* aliases fallback

* nasty datastream support implementation

* datastreams stats method

* update filter stream

* datastream support for unload action

* create-index datastream support

* index records data stream support

* doc records data streams support

* [CI] Auto-commit changed files from 'node scripts/eslint --no-cache --fix'

* lint

* pull composable templates

* set data_stream as a separate property on documents

* force create bulk operation when datastream record

* [CI] Auto-commit changed files from 'node scripts/eslint --no-cache --fix'

* lint

* getIndexTemplate tests

* [CI] Auto-commit changed files from 'node scripts/precommit_hook.js --ref HEAD~1..HEAD --fix'

* share cache across transform executions

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 4c4f0f5)

Co-authored-by: Kevin Lacabane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed release_note:enhancement Team:Operations Team label for Operations Team v8.3.1 v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for data streams in ES Archiver
5 participants