Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immediately upgrading a downgraded tsdb data stream fails #96163

Open
martijnvg opened this issue May 16, 2023 · 10 comments
Open

Immediately upgrading a downgraded tsdb data stream fails #96163

martijnvg opened this issue May 16, 2023 · 10 comments
Assignees
Labels
>docs General docs changes :StorageEngine/TSDB You know, for Metrics Team:Docs Meta label for docs team Team:StorageEngine

Comments

@martijnvg
Copy link
Member

Immediately upgrading a data stream to tsdb after is has been downgraded from a tsdb data stream fails the execute.
This is because there already exists a tsdb backing index and the rollover doesn't detects that, because the data stream is non tsdb.

Note that after waiting ~4hrs the rollover should succeed.

Reproduction:

PUT _index_template/1
{
  "index_patterns": [
    "test*"
  ],
  "template": {
    "settings": {
      "index": {
        "mode": "time_series"
      }
    },
    "mappings": {
      "properties": {
          "my_field": {
              "time_series_dimension": true,
              "type": "keyword"
          }
      }
    }
  },
  "data_stream": {}
}

POST test1/_doc
{
  "@timestamp": "2023-05-16T11:49:50.599Z",
  "my_field": "value"
}

PUT _index_template/1
{
    "index_patterns": [
        "test*"
    ],
    "template": {
        "settings": {
            "index": {
                "mode": null
            }
        },
        "mappings": {
            "properties": {
                "my_field": {
                    "time_series_dimension": true,
                    "type": "keyword"
                }
            }
        }
    },
    "data_stream": {}
}

POST test1/_rollover

PUT _index_template/1
{
    "index_patterns": [
        "test*"
    ],
    "template": {
        "settings": {
            "index": {
                "mode": "time_series"
            }
        },
        "mappings": {
            "properties": {
                "my_field": {
                    "time_series_dimension": true,
                    "type": "keyword"
                }
            }
        }
    },
    "data_stream": {}
}

POST test1/_rollover
@martijnvg martijnvg added >bug :StorageEngine/TSDB You know, for Metrics labels May 16, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 16, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

kpollich added a commit to elastic/kibana that referenced this issue May 16, 2023
… installed (#157869)

## Summary

Fixes #157345

When a package with a changed `index.mode` or `source.mode` setting is
installed, Fleet will now automatically perform a rollover to ensure the
correct setting is present on the resulting backing index.

There is an issue with Elasticsearch wherein toggling these settings
back and forth will incur a backing index range overlap error. See
elastic/elasticsearch#96163.

To test
1. Install the `system` integration at version `1.28.0`
2. Create an integration policy for the `system` integration (a standard
default agent policy will do)
3. Enroll an agent in this policy, and allow it to ingest some data
4. Confirm that there are documents present in the
`metrics-system.cpu-default` data stream, and note its backing index via
Stack Management
5. Create a new `1.28.1` version of the `system` integration where
`elasticsearch.index_mode: time_series` is set and install it via
`elastic-package install --zip`
6. Confirm that a rollover occurs and the backing index for the
`metrics-system.cpu-default` data stream has been updated

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kibana Machine <[email protected]>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue May 16, 2023
… installed (elastic#157869)

## Summary

Fixes elastic#157345

When a package with a changed `index.mode` or `source.mode` setting is
installed, Fleet will now automatically perform a rollover to ensure the
correct setting is present on the resulting backing index.

There is an issue with Elasticsearch wherein toggling these settings
back and forth will incur a backing index range overlap error. See
elastic/elasticsearch#96163.

To test
1. Install the `system` integration at version `1.28.0`
2. Create an integration policy for the `system` integration (a standard
default agent policy will do)
3. Enroll an agent in this policy, and allow it to ingest some data
4. Confirm that there are documents present in the
`metrics-system.cpu-default` data stream, and note its backing index via
Stack Management
5. Create a new `1.28.1` version of the `system` integration where
`elasticsearch.index_mode: time_series` is set and install it via
`elastic-package install --zip`
6. Confirm that a rollover occurs and the backing index for the
`metrics-system.cpu-default` data stream has been updated

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kibana Machine <[email protected]>
(cherry picked from commit 22e3847)
kibanamachine added a commit to elastic/kibana that referenced this issue May 16, 2023
…ged is installed (#157869) (#157916)

# Backport

This will backport the following commits from `main` to `8.8`:
- [[Fleet] Rollover data streams when package w/ TSDB setting changed is
installed (#157869)](#157869)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Kyle
Pollich","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-05-16T18:16:14Z","message":"[Fleet]
Rollover data streams when package w/ TSDB setting changed is installed
(#157869)\n\n## Summary\r\n\r\nFixes
https://github.com/elastic/kibana/issues/157345\r\n\r\nWhen a package
with a changed `index.mode` or `source.mode` setting is\r\ninstalled,
Fleet will now automatically perform a rollover to ensure the\r\ncorrect
setting is present on the resulting backing index.\r\n\r\nThere is an
issue with Elasticsearch wherein toggling these settings\r\nback and
forth will incur a backing index range overlap error.
See\r\nhttps://github.com/elastic/elasticsearch/issues/96163.\r\n\r\nTo
test\r\n1. Install the `system` integration at version `1.28.0`\r\n2.
Create an integration policy for the `system` integration (a
standard\r\ndefault agent policy will do)\r\n3. Enroll an agent in this
policy, and allow it to ingest some data\r\n4. Confirm that there are
documents present in the\r\n`metrics-system.cpu-default` data stream,
and note its backing index via\r\nStack Management\r\n5. Create a new
`1.28.1` version of the `system` integration
where\r\n`elasticsearch.index_mode: time_series` is set and install it
via\r\n`elastic-package install --zip`\r\n6. Confirm that a rollover
occurs and the backing index for the\r\n`metrics-system.cpu-default`
data stream has been updated\r\n\r\n### Checklist\r\n\r\nDelete any
items that are not applicable to this PR.\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"22e38472f6f05f9e72d97e74ff8328565da4d53b","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","backport:prev-minor","v8.9.0"],"number":157869,"url":"https://github.com/elastic/kibana/pull/157869","mergeCommit":{"message":"[Fleet]
Rollover data streams when package w/ TSDB setting changed is installed
(#157869)\n\n## Summary\r\n\r\nFixes
https://github.com/elastic/kibana/issues/157345\r\n\r\nWhen a package
with a changed `index.mode` or `source.mode` setting is\r\ninstalled,
Fleet will now automatically perform a rollover to ensure the\r\ncorrect
setting is present on the resulting backing index.\r\n\r\nThere is an
issue with Elasticsearch wherein toggling these settings\r\nback and
forth will incur a backing index range overlap error.
See\r\nhttps://github.com/elastic/elasticsearch/issues/96163.\r\n\r\nTo
test\r\n1. Install the `system` integration at version `1.28.0`\r\n2.
Create an integration policy for the `system` integration (a
standard\r\ndefault agent policy will do)\r\n3. Enroll an agent in this
policy, and allow it to ingest some data\r\n4. Confirm that there are
documents present in the\r\n`metrics-system.cpu-default` data stream,
and note its backing index via\r\nStack Management\r\n5. Create a new
`1.28.1` version of the `system` integration
where\r\n`elasticsearch.index_mode: time_series` is set and install it
via\r\n`elastic-package install --zip`\r\n6. Confirm that a rollover
occurs and the backing index for the\r\n`metrics-system.cpu-default`
data stream has been updated\r\n\r\n### Checklist\r\n\r\nDelete any
items that are not applicable to this PR.\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"22e38472f6f05f9e72d97e74ff8328565da4d53b"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/157869","number":157869,"mergeCommit":{"message":"[Fleet]
Rollover data streams when package w/ TSDB setting changed is installed
(#157869)\n\n## Summary\r\n\r\nFixes
https://github.com/elastic/kibana/issues/157345\r\n\r\nWhen a package
with a changed `index.mode` or `source.mode` setting is\r\ninstalled,
Fleet will now automatically perform a rollover to ensure the\r\ncorrect
setting is present on the resulting backing index.\r\n\r\nThere is an
issue with Elasticsearch wherein toggling these settings\r\nback and
forth will incur a backing index range overlap error.
See\r\nhttps://github.com/elastic/elasticsearch/issues/96163.\r\n\r\nTo
test\r\n1. Install the `system` integration at version `1.28.0`\r\n2.
Create an integration policy for the `system` integration (a
standard\r\ndefault agent policy will do)\r\n3. Enroll an agent in this
policy, and allow it to ingest some data\r\n4. Confirm that there are
documents present in the\r\n`metrics-system.cpu-default` data stream,
and note its backing index via\r\nStack Management\r\n5. Create a new
`1.28.1` version of the `system` integration
where\r\n`elasticsearch.index_mode: time_series` is set and install it
via\r\n`elastic-package install --zip`\r\n6. Confirm that a rollover
occurs and the backing index for the\r\n`metrics-system.cpu-default`
data stream has been updated\r\n\r\n### Checklist\r\n\r\nDelete any
items that are not applicable to this PR.\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"22e38472f6f05f9e72d97e74ff8328565da4d53b"}}]}]
BACKPORT-->

Co-authored-by: Kyle Pollich <[email protected]>
@mlunadia
Copy link

@martijnvg what is the recommended cooling time between changing index for seeing no rollover issues? Do you think this and the manual change mechanism should be documented by Kibana or ES?

@martijnvg
Copy link
Member Author

what is the recommended cooling time between changing index for seeing no rollover issues?

I think by default the recommended cooling time should be 4 hours. But this could be less if downgrading happened some time later after the last tsdb rollover.

But this also depends on whether a custom index.look_ahead_time has been set. This default to 2 hours. The first backing index will have start time of now - look_ahead_time and end time of now + look_ahead_time.

Do you think this and the manual change mechanism should be documented by Kibana or ES?

I don't think we have a documented upgrading to and downgrading from tsdb. I think Elasticsearch should docs around this, but I think Kibana too (a minimised version of it).

@felixbarny
Copy link
Member

Is there a way for ES to automatically adjust the end time to the max @timestamp when rolling over a data stream? That would eliminate the issue assuming that the actual timestamps in that index are lower than the current time. Alternatively, could ES create a new backing index that has a start time that's higher than the existing backing indices' end time?

@martijnvg
Copy link
Member Author

Is there a way for ES to automatically adjust the end time to the max @timestamp when rolling over a data stream?

In the context of the rollover operation the information required to update the index.time_series.end_time isn't available.
Maybe on a downgraded data stream we could trim the index.time_series.end_time index setting based on the highest @timestamp in the backing index. But this would need to be done in a separate api call. This api doesn't exist today.

Alternatively, could ES create a new backing index that has a start time that's higher than the existing backing indices' end time?

Yes, but that index could be up to 4 hours in the future and will not end up getting used. Meanwhile current writes will go to the older tsdb backing index. And my concern is that if this downgrade and upgrade cycle happens again then we end up with another tsdb backing index but then up to 8 hours in the future.

jasonrhodes pushed a commit to elastic/kibana that referenced this issue May 17, 2023
… installed (#157869)

## Summary

Fixes #157345

When a package with a changed `index.mode` or `source.mode` setting is
installed, Fleet will now automatically perform a rollover to ensure the
correct setting is present on the resulting backing index.

There is an issue with Elasticsearch wherein toggling these settings
back and forth will incur a backing index range overlap error. See
elastic/elasticsearch#96163.

To test
1. Install the `system` integration at version `1.28.0`
2. Create an integration policy for the `system` integration (a standard
default agent policy will do)
3. Enroll an agent in this policy, and allow it to ingest some data
4. Confirm that there are documents present in the
`metrics-system.cpu-default` data stream, and note its backing index via
Stack Management
5. Create a new `1.28.1` version of the `system` integration where
`elasticsearch.index_mode: time_series` is set and install it via
`elastic-package install --zip`
6. Confirm that a rollover occurs and the backing index for the
`metrics-system.cpu-default` data stream has been updated

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kibana Machine <[email protected]>
@martijnvg martijnvg self-assigned this May 23, 2023
@martijnvg martijnvg added >docs General docs changes and removed >bug labels May 26, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Docs Meta label for docs team label May 26, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@martijnvg
Copy link
Member Author

This issue was discussed in yesterday's tsdb integration sync. The fact that due to how downgrading from tsdb and upgrading to tsdb works causes this bug isn't ideal, but isn't something that we will address. This is because immediately upgrading a downgrade data stream to tsdb isn't a use case we need to support. It is okay if there is some wait time before again upgrading to tsdb.

We do need to document this as part of upgrading to tsdb and downgrading from tsdb.

@wchaparro wchaparro removed the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 20, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@nchaulet
Copy link
Member

nchaulet commented Oct 7, 2024

It happens in Fleet we have to rollback our integrations and Fleet can trigger automatic upgrades and having to wait n hours to be able to upgrade again because of that behaviour, it is not optimal, both for Fleet and for our users, should/could this be automatically handled by elasticsearch?

@martijnvg
Copy link
Member Author

It happens in Fleet we have to rollback our integrations and Fleet can trigger automatic upgrades

Our assumption was that rollbacks should occur rarely. Typically an integration's template / mapping has to be modified in order to be ready for tsdb. After testing, the chance of rolling back should be small. Unless there is some unforeseen bug or the tradeoffs that come with tsdb don't work out well. In that case the second upgrade to tsdb could be days / weeks after the rollback.

and having to wait n hours to be able to upgrade again because of that behaviour,

On recent versions, the wait time is lower now. If index.look_back_time setting is set to 1 minute that migrating to tsdb after a rollback can occur as soon as 31 minutes after rollback.

it is not optimal, both for Fleet and for our users, should/could this be automatically handled by elasticsearch?

This is something we can address, but it had always lower priority over other work. This mainly was based on the fact that we assumed that upgrading minutes to hours after a rollback isn't a common scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :StorageEngine/TSDB You know, for Metrics Team:Docs Meta label for docs team Team:StorageEngine
Projects
None yet
Development

No branches or pull requests

6 participants