-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Upgrade Assistant] Server-side batch reindexing #58598
[Upgrade Assistant] Server-side batch reindexing #58598
Conversation
Pinging @elastic/es-ui (Team:Elasticsearch UI) |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @jloleysens!
Batch reindexing works great. I may have found a small regression with reindexing a single index. If I create an index, POST /api/upgrade_assistant/reindex/<my_index>
, then execute that request again, I get an Internal Error
error message. However, on master, I get A reindex operation already in-progress for <my_index>
, which is what I would expect. Can you take a look?
@alisonelizabeth I noticed the error handling was a bit problematic with the reindex service return Boom errors. There was an error here with differences between the reindex endpoints. I created this PR #58715 to fix those. Do you think we can merge that one first, then I will resolve the conflicts and we can return to this one? |
@jloleysens 👍sounds good. I'll review that one next. |
…dex-server-side * 'master' of github.com:elastic/kibana: (34 commits) [Upgrade Assistant] Remove "boom" from reindex service (elastic#58715) [data] Clean up QueryStringInput unit tests (elastic#58704) [SIEM] Detection Fix typo in Adobe Hijack Persistence rule (elastic#58804) Restores [SIEM][CASE] Init Configure Case Page (elastic#58121) (elastic#58924) Skips additional failing Ingest Manager integration tests Skips failing Ingest Manager integration tests Move dev tools styles to NP (elastic#58855) change to have kibana --ssl cli option use more recent certs (elastic#57933) disable failing suite (elastic#58942) Don't start pollEsNodesVersion unless someone subscribes (elastic#56923) Do not write UUID file during optimize process (elastic#58899) [Endpoint] Task/add nav bar (elastic#58604) [Metric Alerts] Add backend support for multiple expressions per alert (elastic#58672) [Metrics Alerts] Fix alerting on a rate aggregation (elastic#58789) disable flaky suite (elastic#55953) Revert "[SIEM] apollo@3 (elastic#51926)" and "[SIEM][CASE] Init Confi… (elastic#58806) [resubmit] Prep agg types for new platform (elastic#58893) [Lens] Allow number formatting within Lens (elastic#56253) [Autocomplete] Use settings from config rather than UI settings (elastic#58784) Improve action and trigger types (elastic#58657) ... # Conflicts: # x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.ts
"sucesses" does not communicate accurately what has happened. "started" more closely reflects what has happened.
@alisonelizabeth pointed out that we probably don't want to, by default, kick off reindex jobs in parallel for batches as these could be really resource intensive. This will require a queue mechanism and probably some other revisions to the current contribution. |
Changed the batchqueues implementation to only using a single queue - since there is only one ES that it is interacting with. Before continuing with this work, just making sure that these pre- cautions are necessary!
Queue settings can be set on a reindex operation and set a timemstamp value on the reindex operation for the scheduler to use down the line for ordering operations and running them in series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @jloleysens! Code works as described. The regression I found earlier with the error handling has also been resolved 🎉
Would you mind updating the "How to test" section with the updated response (step 3)? Also, I happened to notice there are existing docs for the UA APIs; we should work with Gail to update these whenever we’re ready (https://www.elastic.co/guide/en/kibana/current/upgrade-assistant-api.html).
Other thoughts - maybe this will become more clear once we start working on the UI (and I get more familiar with UA 😄) but, would it be useful at all to know where in the queue the index is and how many total are in the queue in the response? Also, did you test this with a large number of indices (I only did a handful)? Should we impose any sort of limit?
for (const inProgressOp of inProgressOps) { | ||
if (inProgressOp.attributes.reindexOptions?.queueSettings) { | ||
queueOps.push(inProgressOp); | ||
} else { | ||
parallelOps.push(inProgressOp); | ||
} | ||
} | ||
|
||
if (queueOps.length) { | ||
const [firstInQueueOp] = queueOps.sort( | ||
(a, b) => | ||
a.attributes.reindexOptions!.queueSettings!.queuedAt - | ||
b.attributes.reindexOptions!.queueSettings!.queuedAt | ||
); | ||
|
||
this.log.debug( | ||
`Queue detected; current length ${queueOps.length}, current item ReindexOperation(id: ${firstInQueueOp.id}, indexName: ${firstInQueueOp.attributes.indexName})` | ||
); | ||
|
||
this.inProgressOps = parallelOps.concat(firstInQueueOp); | ||
} else { | ||
this.inProgressOps = parallelOps; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took me some time to figure out what this was supposed to do. I think this could benefit from a comment that explains the algorithm that decides which reindex operations should be processed.
Alternatively, this could benefit from extracting the logic into a named function that explains what it does:
const inProgressOps = await this.reindexService.findAllByStatus(ReindexStatus.inProgress);
const firstOpInQueue = getFirstInQueue(inProgressOps);
const unqueuedOps = inProgressOps
.filter(op => !!op.attributes.reindexOptions?.queueSettings);
this.inProgressOps = [...unqueuedOps, ...([firstOpInQueue] ? firstOpInQueue : [])];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @joshdover ! I'm happy the implementation (on the whole) makes sense.
I'll clarify the context in which this code runs with a comment because it's not clear from just looking at it here that it runs on a schedule and what exactly "parallel" means.
Overall this makes sense to me! This will definitely make upgrading less tedious 😄 |
@alisonelizabeth thanks for taking another look!
Will do!
I was not entirely clear on which response you were referring to here. The response from the I've also realised we need to cater better for restarting a failed/cancelled operation in a batch (the queuedAt time needs to updated again) and we probably want a |
Created a new file op_utils where logic repsonsible for sorting and ordering reindex operation saved objects is.
Also assert that reindexing is happening in the expected order
This allows users of the API to see what the current queue state is for visibility. Using the queue endpoint int he API integration tests for batch too.
If a reindexOperation is being resumed and put in a queue we also need to reset the queuedAt timestamp to respect the new batch queue ordering.
@elasticmachine merge upstream |
Added 'undefined' as the second optional param to resumeIndexOperation call.
Sorry I wasn't vert clear. Yeah, that's similar to what I had in mind. I think this makes sense. Thanks!
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest LGTM. Nice work @jloleysens!
* Added server side logic for handling batch reindex * Remove literal string interpolation from translation * Refactor return value of batch endpoint "sucesses" does not communicate accurately what has happened. "started" more closely reflects what has happened. * First iteration of batch queues * Single queue Changed the batchqueues implementation to only using a single queue - since there is only one ES that it is interacting with. Before continuing with this work, just making sure that these pre- cautions are necessary! * Clean up old batch queue implementation * Slight refactor * Revert batch queues implementation * Introduction of QueueSettings Queue settings can be set on a reindex operation and set a timemstamp value on the reindex operation for the scheduler to use down the line for ordering operations and running them in series * Updated worker logic to handle items in queue in series * Refactor /batch endpoint response to "enqueued" not "started" * Fixed jest tests * Refactor worker refresh operations for readability Created a new file op_utils where logic repsonsible for sorting and ordering reindex operation saved objects is. * Add batch API integration test Also assert that reindexing is happening in the expected order * Added a new endpoint: GET batch/queue This allows users of the API to see what the current queue state is for visibility. Using the queue endpoint int he API integration tests for batch too. * Reset the queuedAt timestamp on resume If a reindexOperation is being resumed and put in a queue we also need to reset the queuedAt timestamp to respect the new batch queue ordering. * Fix jest test Added 'undefined' as the second optional param to resumeIndexOperation call. Co-authored-by: Elastic Machine <[email protected]> # Conflicts: # x-pack/plugins/upgrade_assistant/server/lib/reindexing/error.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/error_symbols.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.ts # x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.test.ts # x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.ts # x-pack/test/upgrade_assistant_integration/upgrade_assistant/reindexing.js
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
…x-closed-index * 'master' of github.com:elastic/kibana: (32 commits) [ML] Use Kibana's HttpHandler for HTTP requests (elastic#59320) [APM] Create settings page to manage Custom Links (elastic#57788) [Upgrade Assistant] Server-side batch reindexing (elastic#58598) completes navigation test (elastic#59141) [SIEM] Fixes dragging entries to the Timeline while data is loading may trigger a partial page reload (elastic#59476) [Reporting/Screenshots] Handle page setup errors and capture the page, don't fail the job (elastic#58683) [SIEM] [CASES] API with io-ts validation (elastic#59265) Use camelCase rather than snakeCase for plugin name (elastic#59461) [Maps] top term percentage field property (elastic#59386) Add custom action to registry and show actions list in siem (elastic#58395) [Search service] Add enhanced ES search strategy (elastic#59224) [Logs UI] Speed up stream rendering using memoization (elastic#59163) expand max-old-space-size for xpack jest tests (elastic#59455) Added possibility to embed connectors create and edit flyouts (elastic#58514) Revert "Temporarily disabling PR project mappings (elastic#59485)" (elastic#59491) Temporarily disabling PR project mappings (elastic#59485) [Endpoint] Fix alert list functional test error (elastic#59357) Rename status_page to statusPage (elastic#59186) Fix visual baseline job (elastic#59348) Extended AlertContextValue with metadata optional property (elastic#59391) ... # Conflicts: # x-pack/plugins/upgrade_assistant/common/types.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_actions.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.test.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.ts # x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.test.ts # x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.ts
Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync. |
* Added server side logic for handling batch reindex * Remove literal string interpolation from translation * Refactor return value of batch endpoint "sucesses" does not communicate accurately what has happened. "started" more closely reflects what has happened. * First iteration of batch queues * Single queue Changed the batchqueues implementation to only using a single queue - since there is only one ES that it is interacting with. Before continuing with this work, just making sure that these pre- cautions are necessary! * Clean up old batch queue implementation * Slight refactor * Revert batch queues implementation * Introduction of QueueSettings Queue settings can be set on a reindex operation and set a timemstamp value on the reindex operation for the scheduler to use down the line for ordering operations and running them in series * Updated worker logic to handle items in queue in series * Refactor /batch endpoint response to "enqueued" not "started" * Fixed jest tests * Refactor worker refresh operations for readability Created a new file op_utils where logic repsonsible for sorting and ordering reindex operation saved objects is. * Add batch API integration test Also assert that reindexing is happening in the expected order * Added a new endpoint: GET batch/queue This allows users of the API to see what the current queue state is for visibility. Using the queue endpoint int he API integration tests for batch too. * Reset the queuedAt timestamp on resume If a reindexOperation is being resumed and put in a queue we also need to reset the queuedAt timestamp to respect the new batch queue ordering. * Fix jest test Added 'undefined' as the second optional param to resumeIndexOperation call. Co-authored-by: Elastic Machine <[email protected]> # Conflicts: # x-pack/plugins/upgrade_assistant/server/lib/reindexing/error.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/error_symbols.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.ts # x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.test.ts # x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.ts # x-pack/test/upgrade_assistant_integration/upgrade_assistant/reindexing.js Co-authored-by: Elastic Machine <[email protected]>
Summary
Add the ability for Upgrade Assistant (UA) to handle reindexing in batches. This contribution focusses on the server-side code.
Two new endpoints are exposed to users:
POST /api/upgrade_assistant/reindex/batch
This new endpoint accepts an array of index names to be reindexed.
Body args
{ "indexNames": [] }
How to test
yarn start --verbose
& ES with default configurationThis is an indication that each job is being processed in a queue when a batch submission is made.
(Optional) Make sure that multiple batch requests can be submitted at the same time (submit a batch request to reindex
test3
andtest4
followed by a batch request to reindextest5
andtest6
(after you have created each).(Optional) Submit a request to reindex
["test7", "test7", "test8"]
. The batch endpoint should report that it could not create a new item for the second "test7".cURL commands
GET /api/upgrade_assistant/reindex/batch/queue
This endpoint provides visibility into the current batch queue.
How to test
With Kibana started (does not require
--verbose
flag), hit the previous endpoint followed by this endpoint to get a current view of the reindex batch queue. Order in which operations will occur is indicated by the order in the array.Release Note
Upgrade Assistant has a new batch endpoint that enables submitting multiple indices for reindexing in one network request. Indices are processed one-by-one to minimize cluster resource usage. This also makes it much easier for users to upgrade indices when a new version of the Elastic stack is released.
Checklist
For maintainers