Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record timestamp field range in index metadata #65689

Conversation

DaveCTurner
Copy link
Contributor

Queries including a filter by timestamp range are common in time-series
data. Moreover older time-series indices are typically made read-only so
that the timestamp range becomes immutable. By recording in the index
metadata the range of timestamps covered by each index we can very
efficiently skip shards on the coordinating node, even if those shards
are not assigned.

This commit computes the timestamp range of immutable indices and
records it in the index metadata as the shards start for the first time.
Note that the only indices it considers immutable today are ones using
the ReadOnlyEngine, which includes frozen indices and searchable
snapshots but not regular indices with a write block.

Backport of #65564

@DaveCTurner DaveCTurner added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport v7.11.0 labels Dec 1, 2020
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 1, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner force-pushed the 2020-12-01-record-timestamp-field-range-in-index-metadata-7x branch from d93986d to 96eadfb Compare December 2, 2020 10:09
Queries including a filter by timestamp range are common in time-series
data. Moreover older time-series indices are typically made read-only so
that the timestamp range becomes immutable. By recording in the index
metadata the range of timestamps covered by each index we can very
efficiently skip shards on the coordinating node, even if those shards
are not assigned.

This commit computes the timestamp range of immutable indices and
records it in the index metadata as the shards start for the first time.
Note that the only indices it considers immutable today are ones using
the `ReadOnlyEngine`, which includes frozen indices and searchable
snapshots but not regular indices with a write block.

Backport of elastic#65564 and elastic#65720
@DaveCTurner DaveCTurner force-pushed the 2020-12-01-record-timestamp-field-range-in-index-metadata-7x branch from 96eadfb to 2c8957c Compare December 2, 2020 10:12
@DaveCTurner
Copy link
Contributor Author

D'oh I didn't realise I already opened a PR for this branch, sorry for the force-pushing.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Dec 2, 2020
@DaveCTurner DaveCTurner merged commit 31503aa into elastic:7.x Dec 2, 2020
@DaveCTurner DaveCTurner deleted the 2020-12-01-record-timestamp-field-range-in-index-metadata-7x branch December 2, 2020 11:11
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Dec 2, 2020
DaveCTurner added a commit that referenced this pull request Dec 2, 2020
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Dec 9, 2020
This commit introduces an optimization that allows skipping shardsthat
are not necessary directly on the coordinator for time based indices.
This is possible for frozen and searchable snapshots since those store
their min/max timestamp range in their IndexMetadata (introduced in elastic#65689).
For indices that don't have that information available, the behaviour is
the sameas it used to be.
fcofdez added a commit that referenced this pull request Dec 15, 2020
…vailable at the coordinator. (#65583)

This commit introduces an optimization that allows skipping shards that
are not necessary directly on the coordinator for time based indices.
This is possible for frozen and searchable snapshots since those store
their min/max timestamp range in their IndexMetadata (introduced in #65689).
For indices that don't have that information available, the behaviour is
the same as it used to be.
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Dec 15, 2020
…vailable at the coordinator.

This commit introduces an optimization that allows skipping shards that
are not necessary directly on the coordinator for time based indices.
This is possible for frozen and searchable snapshots since those store
their min/max timestamp range in their IndexMetadata (introduced in elastic#65689).
For indices that don't have that information available, the behaviour is
the same as it used to be.

Backport of elastic#65583
fcofdez added a commit that referenced this pull request Dec 15, 2020
…a is available at the coordinator. (#66319)

This commit introduces an optimization that allows skipping shards that
are not necessary directly on the coordinator for time based indices.
This is possible for frozen and searchable snapshots since those store
their min/max timestamp range in their IndexMetadata (introduced in #65689).
For indices that don't have that information available, the behaviour is
the same as it used to be.

Backport of #65583
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants