Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add searchable snapshots actions to ILM #50806

Closed
DaveCTurner opened this issue Jan 9, 2020 · 8 comments
Closed

Add searchable snapshots actions to ILM #50806

DaveCTurner opened this issue Jan 9, 2020 · 8 comments
Assignees
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement

Comments

@DaveCTurner
Copy link
Contributor

Today, here is how to turn an index into a searchable snapshot on the feature/searchable-snapshot branch:

## Create normal repository for taking the snapshot
PUT /_snapshot/backing_repo
{
  "settings": {
    "location": "/Users/davidturner/src/elasticsearch-master/repo/backing"
  },
  "type": "fs"
}

## Create searchable repository for the restore
PUT /_snapshot/searchable_repo
{
  "settings": {
    "location": "/Users/davidturner/src/elasticsearch-master/repo/backing",
    "delegate_type": "fs"
  },
  "type": "searchable"
}

## Create and populate the index
PUT /original
{
  "settings": {
    "index.number_of_replicas": 0
  },
  "aliases": {
    "alias": {}
  }
}

POST /original/_bulk?refresh
{"index":{}}
{"foo":"bar"}
{"index":{}}
{"baz":"quux"}

POST /original/_flush

## Force-merge it to a single segment (optional)

POST /original/_forcemerge?max_num_segments=1

## Verify that there are docs in the index

GET /alias/_search
{
  "size": 0
}

# {
#   "took": 1,
#   "_shards": {
#     "skipped": 0,
#     "successful": 1,
#     "total": 1,
#     "failed": 0
#   },
#   "timed_out": false,
#   "hits": {
#     "max_score": null,
#     "total": {
#       "value": 2,
#       "relation": "eq"
#     },
#     "hits": []
#   }
# }

## Take a snapshot of the target index

POST /_snapshot/backing_repo/snap?wait_for_completion=true
{
  "indices": "original",
  "include_global_state": false
}

## Restore the snapshot to a different name

POST /_snapshot/searchable_repo/snap/_restore
{
  "indices": "original",
  "rename_pattern": "original",
  "rename_replacement": "snapped"
}

GET /_cluster/health?wait_for_status=green

## Adjust the alias to point to the restored index and delete the original index

POST /_aliases
{
  "actions": [
    {
      "add": {
        "alias": "alias",
        "index": "snapped"
      }
    },
    {
      "remove": {
        "alias": "alias",
        "index": "original"
      }
    }
  ]
}

DELETE /original

## Verify that there's still two docs visible to searches

GET /alias/_search
{
  "size": 0
}

# {
#   "took": 2,
#   "_shards": {
#     "skipped": 0,
#     "successful": 1,
#     "total": 1,
#     "failed": 0
#   },
#   "timed_out": false,
#   "hits": {
#     "max_score": null,
#     "total": {
#       "value": 2,
#       "relation": "eq"
#     },
#     "hits": []
#   }
# }

## 🚀

This is too much to expect of our users, and it would be much better if ILM would guide the index through this process instead. We propose an ILM action to convert an index to a searchable snapshot, performing the steps above, to take place after the Force Merge action in the warm phase and after the Freeze action in the cold phase. As of today it is not possible to freeze a searchable snapshot but we may add that functionality in future. It is possible to run the Set Priority, Allocate, and Delete actions on a searchable snapshot.

Of particular note is the snapshotting step: we take a snapshot of the single index. This avoids needing to retain a cluster-wide snapshot simply to preserve this single index. It's important that we retain this snapshot until after the index itself is deleted. Once the index is deleted we may (or may not) want to delete the corresponding snapshot.

@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Jan 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@dakrone
Copy link
Member

dakrone commented Jan 10, 2020

Some things to consider:

  • Which phase(s) should we allow converting an index into a searchable snapshot?
  • Should ILM manage deleting the snapshot taken for the index when an index reaches the delete phase? Should this be opt-in or opt-out (or non-configurable)?
  • Interaction with CCR, I believe we'll need to inject an unfollow step prior to converting an index (unless there's a way this transfers to the other cluster?)

@DaveCTurner are two repositories always required? Is there a way to do it with a single repository, or should we plan on specifying both repositories in the ILM action configuration?

@DaveCTurner
Copy link
Contributor Author

Which phase(s) should we allow converting an index into a searchable snapshot?

I am inclined to go with the warm phase; @zuketo @matt-davis-elastic any other thoughts from the product side?

Should ILM manage deleting the snapshot taken for the index when an index reaches the delete phase? Should this be opt-in or opt-out (or non-configurable)?

I would say yes delete the underlying snapshot by default but permit opting out.

I believe we'll need to inject an unfollow step

Yes, I guess this is how we handle the Shrink step too?

are two repositories always required?

Yes, for the foreseeable future, although I recognise that this isn't the best UX. This follows in the path of certain other features that also work by wrapping repositories around each other (e.g. source-only snapshots).

@zuketo
Copy link

zuketo commented Jan 10, 2020

Hi @DaveCTurner I'm curious on why the warm phase? I agree with it, just curious on your thoughts?

Besides the standard time progression use case of data into a searchable snapshot with ILM, there is a less popular use case of getting data into a searchable snapshot as quickly as possible, this could be part of a migration effort, or writing directly to an "archive". I don't think this use case needs ILM, I just want to note it as a potential use case here. Even with this use case and converting to a snapshot in the warm phase, the user could configure a very brief hot phase and still use ILM.

@dakrone
Copy link
Member

dakrone commented Jan 10, 2020

I assumed we'd want it in the cold phase, rather than warm, but we could allow it in either warm or cold?

@DaveCTurner
Copy link
Contributor Author

I'm curious on why the warm phase?

I expect to achieve performance good enough to make it worth having these things available in the warm phase, and there will be a few other benefits over "standard" warm indices too.

@dakrone
Copy link
Member

dakrone commented Apr 13, 2020

I'm going to close this now since @andreidan's PR was merged, and the searchable snapshot work was merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement
Projects
None yet
Development

No branches or pull requests

5 participants