-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Addition of the new Model Management tab #115772
[ML] Addition of the new Model Management tab #115772
Conversation
x-pack/plugins/ml/server/models/memory_overview/memory_overview_service.ts
Outdated
Show resolved
Hide resolved
@@ -380,6 +380,27 @@ export function getMlClient( | |||
async getTrainedModelsStats(...p: Parameters<MlClient['getTrainedModelsStats']>) { | |||
return mlClient.getTrainedModelsStats(...p); | |||
}, | |||
// TODO update when the new elasticsearch-js client is available | |||
async getTrainedModelsDeploymentStats(...p: Parameters<MlClient['getTrainedModelsStats']>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these endpoints do not appear automatically in the esclient you will need to raise an issue in the client spec repo to request that they get added.
This reverts commit 0cf38fb.
* ML job to run on a given node will do this, and then subsequent ML jobs on the same node will reuse the | ||
* same already-loaded code. | ||
*/ | ||
memoryRes[key as keyof typeof memoryRes] += NATIVE_EXECUTABLE_CODE_OVERHEAD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droberts195 should NATIVE_EXECUTABLE_CODE_OVERHEAD
be added to the first job on the node by timestamp or is this order ok where it will be added to AD jobs before DFA, regardless of which types of jobs appeared on the node first.
allocated_models: allocatedModels, | ||
memory_overview: { | ||
machine_memory: { | ||
// @ts-ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this @ts-ignore be removed or a comment added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elsaticsearch client types haven't been updated yet to support adjusted_total_in_bytes
. I'll add a TODO
comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in ee2201f
const adMemoryReport = await memoryOverviewService.getAnomalyDetectionMemoryOverview(); | ||
const dfaMemoryReport = await memoryOverviewService.getDFAMemoryOverview(); | ||
|
||
// @ts-ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this @ts-ignore be removed or a comment added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 3b2a39c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and LGTM ⚡
💛 Build succeeded, but was flaky
Test Failures
Metrics [docs]Module Count
Async chunks
Page load bundle
History
To update your PR or re-run it, just comment with: cc @darnautov |
💔 Backport failedThe backport operation could not be completed due to the following error: The backport PRs will be merged automatically after passing CI. To backport manually run: |
Sorry @darnautov, but this broke types when it was merged because of a conflict with #113950. Please resubmit the PR with the latest master merged and we can get it back in |
* [ML] trained models tab * [ML] wip nodes list * [ML] add types * [ML] add types * [ML] node expanded row * [ML] wip show memory usage * [ML] refactor, use model_memory_limit for dfa jobs * [ML] fix refresh button * [ML] add process memory overhead * [ML] trained models memory overview * [ML] add jvm size, remove node props from the response * [ML] fix tab name * [ML] custom colors for the bar chart * [ML] sub jvm size * [ML] updates for the model list * [ML] apply native process overhead * [ML]add adjusted_total_in_bytes * [ML] start and stop deployment * [ML] fix default sorting * [ML] fix types issues * [ML] fix const * [ML] remove unused i18n strings * [ML] fix lint * [ML] extra custom URLs test * [ML] update tests for model provider * [ML] add node routing state info * [ML] fix functional tests * [ML] update for es response * [ML] GetTrainedModelDeploymentStats * [ML] add deployment stats * [ML] add spacer * [ML] disable stop allocation for models with pipelines * [ML] fix type * [ML] add beta label * [ML] move beta label * [ML] rename model_size prop * [ML] update tooltip header * [ML] update text * [ML] remove ts ignore * [ML] update types * remove commented code * replace toast notification service * remove ts-ignore * remove empty panel * add comments, update test subjects * fix ts error * update comment * fix applying memory overhead * Revert "fix applying memory overhead" This reverts commit 0cf38fb. * fix type, remove ts-ignore * add todo comment (cherry picked from commit 605e9e2)
* [ML] Nodes overview for the Model Management page (#115772) * [ML] trained models tab * [ML] wip nodes list * [ML] add types * [ML] add types * [ML] node expanded row * [ML] wip show memory usage * [ML] refactor, use model_memory_limit for dfa jobs * [ML] fix refresh button * [ML] add process memory overhead * [ML] trained models memory overview * [ML] add jvm size, remove node props from the response * [ML] fix tab name * [ML] custom colors for the bar chart * [ML] sub jvm size * [ML] updates for the model list * [ML] apply native process overhead * [ML]add adjusted_total_in_bytes * [ML] start and stop deployment * [ML] fix default sorting * [ML] fix types issues * [ML] fix const * [ML] remove unused i18n strings * [ML] fix lint * [ML] extra custom URLs test * [ML] update tests for model provider * [ML] add node routing state info * [ML] fix functional tests * [ML] update for es response * [ML] GetTrainedModelDeploymentStats * [ML] add deployment stats * [ML] add spacer * [ML] disable stop allocation for models with pipelines * [ML] fix type * [ML] add beta label * [ML] move beta label * [ML] rename model_size prop * [ML] update tooltip header * [ML] update text * [ML] remove ts ignore * [ML] update types * remove commented code * replace toast notification service * remove ts-ignore * remove empty panel * add comments, update test subjects * fix ts error * update comment * fix applying memory overhead * Revert "fix applying memory overhead" This reverts commit 0cf38fb. * fix type, remove ts-ignore * add todo comment (cherry picked from commit 605e9e2) * updates for the latest elasticsearch client * hide allocated models when missing * [ML] Update jest test mock Co-authored-by: Quynh Nguyen <[email protected]>
Summary
Resolves #114437 and #114438
3rd party model support introduces the ability to use PyTorch models trained outside of the Stack. For operational and managing purposes, this PR adds nodes overview with memory breakdown and allocated models info, and updates the model list with actions for deployed models and deployment stats.
Moves trained model to a dedicated top-level nav tab with an experimental badge
Updates layout for the model list
Updates "Type" filter with a
model_type
, e.g.pytorch
,tree_ensemble
Adds "Deployment stats" for pytorch models
Adds nodes list
Checklist
Delete any items that are not applicable to this PR.