Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Upgrade Assistant] Server-side batch reindexing #58598

Merged
merged 21 commits into from
Mar 6, 2020

Conversation

jloleysens
Copy link
Contributor

@jloleysens jloleysens commented Feb 26, 2020

Summary

Add the ability for Upgrade Assistant (UA) to handle reindexing in batches. This contribution focusses on the server-side code.

Two new endpoints are exposed to users:

POST /api/upgrade_assistant/reindex/batch

This new endpoint accepts an array of index names to be reindexed.

Body args

{
    "indexNames": []
}

How to test

  1. Start Kibana with yarn start --verbose & ES with default configuration
  2. Run the cURL commands below
  3. Should see something like:
{
  // Top level keys are stable
  "enqueued": [
    {
      // The values in here are subject to change and should not be considered stable!
      "indexName": "test7",
      "newIndexName": "reindexed-v8-test7",
      "status": 3,
      "lastCompletedStep": 0,
      "locked": null,
      "reindexTaskId": null,
      "reindexTaskPercComplete": null,
      "errorMessage": null,
      "runningReindexCount": null,
      "reindexOptions": {
        "queueSettings": {
          "queuedAt": 1583406985489
        }
      }
    },
    {
      "indexName": "test8",
      "newIndexName": "reindexed-v8-test8",
      "status": 3,
      "lastCompletedStep": 0,
      "locked": null,
      "reindexTaskId": null,
      "reindexTaskPercComplete": null,
      "errorMessage": null,
      "runningReindexCount": null,
      "reindexOptions": {
        "queueSettings": {
          "queuedAt": 1583406987334
        }
      }
    }
  ],
  "errors": []
}
  1. Keep an eye on the server logs for output like this:
server    log   [11:34:14.933] [debug][plugins][reindex_worker][upgradeAssistant] Queue detected; current length 2, current item ReindexOperation(id: 9ab22360-5e03-11ea-9d2d-1ff9c3164643, indexName: test12)

This is an indication that each job is being processed in a queue when a batch submission is made.

  1. (Optional) Make sure that multiple batch requests can be submitted at the same time (submit a batch request to reindex test3 and test4 followed by a batch request to reindex test5 and test6 (after you have created each).

  2. (Optional) Submit a request to reindex ["test7", "test7", "test8"]. The batch endpoint should report that it could not create a new item for the second "test7".

cURL commands
# Change test1 and test2 to the name of any non-existing index
curl -XPUT http://elastic:changeme@localhost:9200/test1
curl -XPUT http://elastic:changeme@localhost:9200/test2

# swap out "pku" with your local path prefix
curl --request POST \
  --url http://localhost:5603/pku/api/upgrade_assistant/reindex/batch \
  --header 'authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \
  --header 'content-type: application/json' \
  --header 'kbn-xsrf: xxxx' \
  --data '{
        "indexNames": [
                "test1",
                "test2"
        ]
}'

GET /api/upgrade_assistant/reindex/batch/queue

This endpoint provides visibility into the current batch queue.

How to test

With Kibana started (does not require --verbose flag), hit the previous endpoint followed by this endpoint to get a current view of the reindex batch queue. Order in which operations will occur is indicated by the order in the array.

{
  "queue": [
    {
      "indexName": "test8",
      "newIndexName": "reindexed-v8-test8",
      "status": 3,
      "lastCompletedStep": 0,
      "locked": null,
      "reindexTaskId": null,
      "reindexTaskPercComplete": null,
      "errorMessage": null,
      "runningReindexCount": null,
      "reindexOptions": {
        "queueSettings": {
          "queuedAt": 1583406987334
        }
      }
    }
]
}

Release Note

Upgrade Assistant has a new batch endpoint that enables submitting multiple indices for reindexing in one network request. Indices are processed one-by-one to minimize cluster resource usage. This also makes it much easier for users to upgrade indices when a new version of the Elastic stack is released.

Checklist

For maintainers

@jloleysens jloleysens added enhancement New value added to drive a business result v8.0.0 Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more Feature:Upgrade Assistant v7.7.0 labels Feb 26, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/es-ui (Team:Elasticsearch UI)

@jloleysens jloleysens added the release_note:skip Skip the PR/issue when compiling release notes label Feb 26, 2020
@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@alisonelizabeth alisonelizabeth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @jloleysens!

Batch reindexing works great. I may have found a small regression with reindexing a single index. If I create an index, POST /api/upgrade_assistant/reindex/<my_index>, then execute that request again, I get an Internal Error error message. However, on master, I get A reindex operation already in-progress for <my_index>, which is what I would expect. Can you take a look?

@jloleysens
Copy link
Contributor Author

@alisonelizabeth I noticed the error handling was a bit problematic with the reindex service return Boom errors. There was an error here with differences between the reindex endpoints.

I created this PR #58715 to fix those. Do you think we can merge that one first, then I will resolve the conflicts and we can return to this one?

@alisonelizabeth
Copy link
Contributor

@jloleysens 👍sounds good. I'll review that one next.

…dex-server-side

* 'master' of github.com:elastic/kibana: (34 commits)
  [Upgrade Assistant] Remove "boom" from reindex service (elastic#58715)
  [data] Clean up QueryStringInput unit tests (elastic#58704)
  [SIEM] Detection Fix typo in Adobe Hijack Persistence rule (elastic#58804)
  Restores [SIEM][CASE] Init Configure Case Page (elastic#58121) (elastic#58924)
  Skips additional failing Ingest Manager integration tests
  Skips failing Ingest Manager integration tests
  Move dev tools styles to NP (elastic#58855)
  change to have kibana --ssl cli option use more recent certs (elastic#57933)
  disable failing suite (elastic#58942)
  Don't start pollEsNodesVersion unless someone subscribes (elastic#56923)
  Do not write UUID file during optimize process (elastic#58899)
  [Endpoint] Task/add nav bar (elastic#58604)
  [Metric Alerts] Add backend support for multiple expressions per alert  (elastic#58672)
  [Metrics Alerts] Fix alerting on a rate aggregation (elastic#58789)
  disable flaky suite (elastic#55953)
  Revert "[SIEM] apollo@3 (elastic#51926)" and "[SIEM][CASE] Init Confi… (elastic#58806)
  [resubmit] Prep agg types for new platform (elastic#58893)
  [Lens] Allow number formatting within Lens (elastic#56253)
  [Autocomplete] Use settings from config rather than UI settings (elastic#58784)
  Improve action and trigger types (elastic#58657)
  ...

# Conflicts:
#	x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.ts
"sucesses" does not communicate accurately what has happened.
"started" more closely reflects what has happened.
@jloleysens
Copy link
Contributor Author

@alisonelizabeth pointed out that we probably don't want to, by default, kick off reindex jobs in parallel for batches as these could be really resource intensive. This will require a queue mechanism and probably some other revisions to the current contribution.

Changed the batchqueues implementation to only using a single queue
 - since there is only one ES that it is interacting with.

Before continuing with this work, just making sure that these pre-
cautions are necessary!
Queue settings can be set on a reindex operation and set a
timemstamp value on the reindex operation for the scheduler
to use down the line for ordering operations and running them
in series
Copy link
Contributor

@alisonelizabeth alisonelizabeth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @jloleysens! Code works as described. The regression I found earlier with the error handling has also been resolved 🎉

Would you mind updating the "How to test" section with the updated response (step 3)? Also, I happened to notice there are existing docs for the UA APIs; we should work with Gail to update these whenever we’re ready (https://www.elastic.co/guide/en/kibana/current/upgrade-assistant-api.html).

Other thoughts - maybe this will become more clear once we start working on the UI (and I get more familiar with UA 😄) but, would it be useful at all to know where in the queue the index is and how many total are in the queue in the response? Also, did you test this with a large number of indices (I only did a handful)? Should we impose any sort of limit?

Comment on lines 135 to 157
for (const inProgressOp of inProgressOps) {
if (inProgressOp.attributes.reindexOptions?.queueSettings) {
queueOps.push(inProgressOp);
} else {
parallelOps.push(inProgressOp);
}
}

if (queueOps.length) {
const [firstInQueueOp] = queueOps.sort(
(a, b) =>
a.attributes.reindexOptions!.queueSettings!.queuedAt -
b.attributes.reindexOptions!.queueSettings!.queuedAt
);

this.log.debug(
`Queue detected; current length ${queueOps.length}, current item ReindexOperation(id: ${firstInQueueOp.id}, indexName: ${firstInQueueOp.attributes.indexName})`
);

this.inProgressOps = parallelOps.concat(firstInQueueOp);
} else {
this.inProgressOps = parallelOps;
}
Copy link
Contributor

@joshdover joshdover Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took me some time to figure out what this was supposed to do. I think this could benefit from a comment that explains the algorithm that decides which reindex operations should be processed.

Alternatively, this could benefit from extracting the logic into a named function that explains what it does:

const inProgressOps = await this.reindexService.findAllByStatus(ReindexStatus.inProgress);
const firstOpInQueue = getFirstInQueue(inProgressOps);
const unqueuedOps = inProgressOps
  .filter(op => !!op.attributes.reindexOptions?.queueSettings);
this.inProgressOps = [...unqueuedOps, ...([firstOpInQueue] ? firstOpInQueue : [])];

Copy link
Contributor Author

@jloleysens jloleysens Mar 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look @joshdover ! I'm happy the implementation (on the whole) makes sense.

I'll clarify the context in which this code runs with a comment because it's not clear from just looking at it here that it runs on a schedule and what exactly "parallel" means.

@joshdover
Copy link
Contributor

Overall this makes sense to me! This will definitely make upgrading less tedious 😄

@jloleysens
Copy link
Contributor Author

@alisonelizabeth thanks for taking another look!

Would you mind updating the "How to test" section with the updated response (step 3)

Will do!

would it be useful at all to know where in the queue the index is and how many total are in the queue in the response?

I was not entirely clear on which response you were referring to here. The response from the POST BASE_PATH/reindex/batch endpoint does indicate the enqueued items in the order they are enqueued. But if we want to, on an ongoing basis, know what the queue looks like I think it does make sense to create a GET BASE_PATH/reindex/batch/queue endpoint that returns an array indicating the current operations in the queue. WDYT?

I've also realised we need to cater better for restarting a failed/cancelled operation in a batch (the queuedAt time needs to updated again) and we probably want a batch/cancel. The latter can wait until we get to the UI I think!

Created a new file op_utils where logic repsonsible for sorting
and ordering reindex operation saved objects is.
Also assert that reindexing is happening in the expected order
This allows users of the API to see what the current queue state
is for visibility. Using the queue endpoint int he API integration
tests for batch too.
If a reindexOperation is being resumed and put in a queue we
also need to reset the queuedAt timestamp to respect the new
batch queue ordering.
@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

elasticmachine and others added 2 commits March 5, 2020 06:27
Added 'undefined' as the second optional param to
resumeIndexOperation call.
@alisonelizabeth
Copy link
Contributor

I was not entirely clear on which response you were referring to here. The response from the POST BASE_PATH/reindex/batch endpoint does indicate the enqueued items in the order they are enqueued. But if we want to, on an ongoing basis, know what the queue looks like I think it does make sense to create a GET BASE_PATH/reindex/batch/queue endpoint that returns an array indicating the current operations in the queue. WDYT?

Sorry I wasn't vert clear. Yeah, that's similar to what I had in mind. I think this makes sense. Thanks!

I've also realised we need to cater better for restarting a failed/cancelled operation in a batch (the queuedAt time needs to updated again) and we probably want a batch/cancel. The latter can wait until we get to the UI I think!

👍

Copy link
Contributor

@alisonelizabeth alisonelizabeth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest LGTM. Nice work @jloleysens!

@jloleysens jloleysens merged commit 651d0a9 into elastic:master Mar 6, 2020
@jloleysens jloleysens deleted the ua/batch-reindex-server-side branch March 6, 2020 09:18
jloleysens added a commit to jloleysens/kibana that referenced this pull request Mar 6, 2020
* Added server side logic for handling batch reindex

* Remove literal string interpolation from translation

* Refactor return value of batch endpoint

"sucesses" does not communicate accurately what has happened.
"started" more closely reflects what has happened.

* First iteration of batch queues

* Single queue

Changed the batchqueues implementation to only using a single queue
 - since there is only one ES that it is interacting with.

Before continuing with this work, just making sure that these pre-
cautions are necessary!

* Clean up old batch queue implementation

* Slight refactor

* Revert batch queues implementation

* Introduction of QueueSettings

Queue settings can be set on a reindex operation and set a
timemstamp value on the reindex operation for the scheduler
to use down the line for ordering operations and running them
in series

* Updated worker logic to handle items in queue in series

* Refactor /batch endpoint response to "enqueued" not "started"

* Fixed jest tests

* Refactor worker refresh operations for readability

Created a new file op_utils where logic repsonsible for sorting
and ordering reindex operation saved objects is.

* Add batch API integration test

Also assert that reindexing is happening in the expected order

* Added a new endpoint: GET batch/queue

This allows users of the API to see what the current queue state
is for visibility. Using the queue endpoint int he API integration
tests for batch too.

* Reset the queuedAt timestamp on resume

If a reindexOperation is being resumed and put in a queue we
also need to reset the queuedAt timestamp to respect the new
batch queue ordering.

* Fix jest test

Added 'undefined' as the second optional param to
resumeIndexOperation call.

Co-authored-by: Elastic Machine <[email protected]>
# Conflicts:
#	x-pack/plugins/upgrade_assistant/server/lib/reindexing/error.ts
#	x-pack/plugins/upgrade_assistant/server/lib/reindexing/error_symbols.ts
#	x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.ts
#	x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.test.ts
#	x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.ts
#	x-pack/test/upgrade_assistant_integration/upgrade_assistant/reindexing.js
@jloleysens jloleysens added release_note:enhancement and removed release_note:skip Skip the PR/issue when compiling release notes enhancement New value added to drive a business result labels Mar 6, 2020
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

jloleysens added a commit to jloleysens/kibana that referenced this pull request Mar 6, 2020
…x-closed-index

* 'master' of github.com:elastic/kibana: (32 commits)
  [ML] Use Kibana's HttpHandler for HTTP requests (elastic#59320)
  [APM] Create settings page to manage Custom Links (elastic#57788)
  [Upgrade Assistant] Server-side batch reindexing (elastic#58598)
  completes navigation test (elastic#59141)
  [SIEM] Fixes dragging entries to the Timeline while data is loading may trigger a partial page reload (elastic#59476)
  [Reporting/Screenshots] Handle page setup errors and capture the page, don't fail the job (elastic#58683)
  [SIEM] [CASES] API with io-ts validation (elastic#59265)
  Use camelCase rather than snakeCase for plugin name (elastic#59461)
  [Maps] top term percentage field property (elastic#59386)
  Add custom action to registry and show actions list in siem (elastic#58395)
  [Search service] Add enhanced ES search strategy (elastic#59224)
  [Logs UI] Speed up stream rendering using memoization (elastic#59163)
  expand max-old-space-size for xpack jest tests (elastic#59455)
  Added possibility to embed connectors create and edit flyouts (elastic#58514)
  Revert "Temporarily disabling PR project mappings (elastic#59485)" (elastic#59491)
  Temporarily disabling PR project mappings (elastic#59485)
  [Endpoint] Fix alert list functional test error (elastic#59357)
  Rename status_page to statusPage (elastic#59186)
  Fix visual baseline job (elastic#59348)
  Extended AlertContextValue with metadata optional property (elastic#59391)
  ...

# Conflicts:
#	x-pack/plugins/upgrade_assistant/common/types.ts
#	x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_actions.ts
#	x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.test.ts
#	x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.ts
#	x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.test.ts
#	x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.ts
@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Mar 7, 2020
jloleysens added a commit that referenced this pull request Mar 7, 2020
* Added server side logic for handling batch reindex

* Remove literal string interpolation from translation

* Refactor return value of batch endpoint

"sucesses" does not communicate accurately what has happened.
"started" more closely reflects what has happened.

* First iteration of batch queues

* Single queue

Changed the batchqueues implementation to only using a single queue
 - since there is only one ES that it is interacting with.

Before continuing with this work, just making sure that these pre-
cautions are necessary!

* Clean up old batch queue implementation

* Slight refactor

* Revert batch queues implementation

* Introduction of QueueSettings

Queue settings can be set on a reindex operation and set a
timemstamp value on the reindex operation for the scheduler
to use down the line for ordering operations and running them
in series

* Updated worker logic to handle items in queue in series

* Refactor /batch endpoint response to "enqueued" not "started"

* Fixed jest tests

* Refactor worker refresh operations for readability

Created a new file op_utils where logic repsonsible for sorting
and ordering reindex operation saved objects is.

* Add batch API integration test

Also assert that reindexing is happening in the expected order

* Added a new endpoint: GET batch/queue

This allows users of the API to see what the current queue state
is for visibility. Using the queue endpoint int he API integration
tests for batch too.

* Reset the queuedAt timestamp on resume

If a reindexOperation is being resumed and put in a queue we
also need to reset the queuedAt timestamp to respect the new
batch queue ordering.

* Fix jest test

Added 'undefined' as the second optional param to
resumeIndexOperation call.

Co-authored-by: Elastic Machine <[email protected]>
# Conflicts:
#	x-pack/plugins/upgrade_assistant/server/lib/reindexing/error.ts
#	x-pack/plugins/upgrade_assistant/server/lib/reindexing/error_symbols.ts
#	x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.ts
#	x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.test.ts
#	x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.ts
#	x-pack/test/upgrade_assistant_integration/upgrade_assistant/reindexing.js

Co-authored-by: Elastic Machine <[email protected]>
@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Mar 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Upgrade Assistant release_note:enhancement Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more v7.7.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants