Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Addition of the new Model Management tab #115772

Merged
merged 55 commits into from
Oct 26, 2021

Conversation

darnautov
Copy link
Contributor

@darnautov darnautov commented Oct 20, 2021

Summary

Resolves #114437 and #114438

3rd party model support introduces the ability to use PyTorch models trained outside of the Stack. For operational and managing purposes, this PR adds nodes overview with memory breakdown and allocated models info, and updates the model list with actions for deployed models and deployment stats.

  • Moves trained model to a dedicated top-level nav tab with an experimental badge
    image

  • Updates layout for the model list

  • Updates "Type" filter with a model_type, e.g. pytorch, tree_ensemble
    image

  • Adds "Deployment stats" for pytorch models
    image

  • Adds nodes list
    image

Checklist

Delete any items that are not applicable to this PR.

@darnautov darnautov self-assigned this Oct 20, 2021
@@ -380,6 +380,27 @@ export function getMlClient(
async getTrainedModelsStats(...p: Parameters<MlClient['getTrainedModelsStats']>) {
return mlClient.getTrainedModelsStats(...p);
},
// TODO update when the new elasticsearch-js client is available
async getTrainedModelsDeploymentStats(...p: Parameters<MlClient['getTrainedModelsStats']>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these endpoints do not appear automatically in the esclient you will need to raise an issue in the client spec repo to request that they get added.

* ML job to run on a given node will do this, and then subsequent ML jobs on the same node will reuse the
* same already-loaded code.
*/
memoryRes[key as keyof typeof memoryRes] += NATIVE_EXECUTABLE_CODE_OVERHEAD;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195 should NATIVE_EXECUTABLE_CODE_OVERHEAD be added to the first job on the node by timestamp or is this order ok where it will be added to AD jobs before DFA, regardless of which types of jobs appeared on the node first.

allocated_models: allocatedModels,
memory_overview: {
machine_memory: {
// @ts-ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this @ts-ignore be removed or a comment added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsaticsearch client types haven't been updated yet to support adjusted_total_in_bytes. I'll add a TODO comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in ee2201f

const adMemoryReport = await memoryOverviewService.getAnomalyDetectionMemoryOverview();
const dfaMemoryReport = await memoryOverviewService.getDFAMemoryOverview();

// @ts-ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this @ts-ignore be removed or a comment added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3b2a39c

Copy link
Member

@jgowdyelastic jgowdyelastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and LGTM ⚡

@darnautov darnautov enabled auto-merge (squash) October 26, 2021 16:29
@darnautov darnautov merged commit 605e9e2 into elastic:master Oct 26, 2021
@kibanamachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Test Failures

  • [job] [logs] OSS Misc Functional Tests / telemetry Telemetry service detects that telemetry cannot be sent in screenshot mode

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
ml 1687 1698 +11

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
ml 3.6MB 3.6MB +11.7KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
ml 34.5KB 34.6KB +158.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @darnautov

@kibanamachine
Copy link
Contributor

💔 Backport failed

The backport operation could not be completed due to the following error:
There are no branches to backport to. Aborting.

The backport PRs will be merged automatically after passing CI.

To backport manually run:
node scripts/backport --pr 115772

spalger added a commit that referenced this pull request Oct 26, 2021
@spalger
Copy link
Contributor

spalger commented Oct 26, 2021

Sorry @darnautov, but this broke types when it was merged because of a conflict with #113950. Please resubmit the PR with the latest master merged and we can get it back in

darnautov added a commit to darnautov/kibana that referenced this pull request Oct 26, 2021
* [ML] trained models tab

* [ML] wip nodes list

* [ML] add types

* [ML] add types

* [ML] node expanded row

* [ML] wip show memory usage

* [ML] refactor, use model_memory_limit for dfa jobs

* [ML] fix refresh button

* [ML] add process memory overhead

* [ML] trained models memory overview

* [ML] add jvm size, remove node props from the response

* [ML] fix tab name

* [ML] custom colors for the bar chart

* [ML] sub jvm size

* [ML] updates for the model list

* [ML] apply native process overhead

* [ML]add adjusted_total_in_bytes

* [ML] start and stop deployment

* [ML] fix default sorting

* [ML] fix types issues

* [ML] fix const

* [ML] remove unused i18n strings

* [ML] fix lint

* [ML] extra custom URLs test

* [ML] update tests for model provider

* [ML] add node routing state info

* [ML] fix functional tests

* [ML] update for es response

* [ML] GetTrainedModelDeploymentStats

* [ML] add deployment stats

* [ML] add spacer

* [ML] disable stop allocation for models with pipelines

* [ML] fix type

* [ML] add beta label

* [ML] move beta label

* [ML] rename model_size prop

* [ML] update tooltip header

* [ML] update text

* [ML] remove ts ignore

* [ML] update types

* remove commented code

* replace toast notification service

* remove ts-ignore

* remove empty panel

* add comments, update test subjects

* fix ts error

* update comment

* fix applying memory overhead

* Revert "fix applying memory overhead"

This reverts commit 0cf38fb.

* fix type, remove ts-ignore

* add todo comment

(cherry picked from commit 605e9e2)
darnautov added a commit that referenced this pull request Oct 27, 2021
* [ML] Nodes overview for the Model Management page  (#115772)

* [ML] trained models tab

* [ML] wip nodes list

* [ML] add types

* [ML] add types

* [ML] node expanded row

* [ML] wip show memory usage

* [ML] refactor, use model_memory_limit for dfa jobs

* [ML] fix refresh button

* [ML] add process memory overhead

* [ML] trained models memory overview

* [ML] add jvm size, remove node props from the response

* [ML] fix tab name

* [ML] custom colors for the bar chart

* [ML] sub jvm size

* [ML] updates for the model list

* [ML] apply native process overhead

* [ML]add adjusted_total_in_bytes

* [ML] start and stop deployment

* [ML] fix default sorting

* [ML] fix types issues

* [ML] fix const

* [ML] remove unused i18n strings

* [ML] fix lint

* [ML] extra custom URLs test

* [ML] update tests for model provider

* [ML] add node routing state info

* [ML] fix functional tests

* [ML] update for es response

* [ML] GetTrainedModelDeploymentStats

* [ML] add deployment stats

* [ML] add spacer

* [ML] disable stop allocation for models with pipelines

* [ML] fix type

* [ML] add beta label

* [ML] move beta label

* [ML] rename model_size prop

* [ML] update tooltip header

* [ML] update text

* [ML] remove ts ignore

* [ML] update types

* remove commented code

* replace toast notification service

* remove ts-ignore

* remove empty panel

* add comments, update test subjects

* fix ts error

* update comment

* fix applying memory overhead

* Revert "fix applying memory overhead"

This reverts commit 0cf38fb.

* fix type, remove ts-ignore

* add todo comment

(cherry picked from commit 605e9e2)

* updates for the latest elasticsearch client

* hide allocated models when missing

* [ML] Update jest test mock

Co-authored-by: Quynh Nguyen <[email protected]>
@lcawl lcawl changed the title [ML] Nodes overview for the Model Management page [ML] Addition of the new Model Management tab Nov 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed buildkite-ci Feature:3rd Party Models ML 3rd party models :ml release_note:feature Makes this part of the condensed release notes v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] 3rd party models deployment overview
9 participants