Fix Graphs & Configure Infrastructure #1023

andrewballantyne · 2023-03-17T18:46:48Z

Work towards: #1022

Description

Added the Runtime Metrics page & reworked the Inference Metrics page. Added a stacked line chart.

----Queries do not represent data right now----

Global - Inference Metrics

Project - Metrics kebab item added to Runtime:

Project - Runtime Metrics

Project - Inference Metrics

Scale options

Stacked Line Chart (random data):

Things that still need to be done (post this PR):

~~Improve x-axis to have more desired and static number of items (disable the auto x-axis from Victory)~~
Improve tooltips -- stacked charts should have one tooltip & it should be styled with the x-axis in the tooltip
Get queries from @VedantMahabaleshwarkar and add them to the utilities

How Has This Been Tested?

KFDef

apiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
  name: odh-with-modelmesh
  namespace: opendatahub
spec:
  applications:
    # Base -- NB Images, Manifest, NB Controller
    - kustomizeConfig:
        repoRef:
          name: manifests
          path: odh-common
      name: odh-common
    - kustomizeConfig:
        overlays:
          - additional
        repoRef:
          name: manifests
          path: notebook-images
      name: notebook-images
    - kustomizeConfig:
        repoRef:
          name: manifests
          path: odh-notebook-controller
      name: odh-notebook-controller
    # Model Mesh
    - kustomizeConfig:
        overlays:
          - odh-model-controller
        parameters:
          - name: monitoring-namespace
            value: opendatahub
        repoRef:
          name: manifests-modelmesh
          path: manifests/opendatahub
      name: model-mesh
    - kustomizeConfig:
        parameters:
          - name: deployment-namespace
            value: opendatahub
        repoRef:
          name: manifests
          path: modelmesh-monitoring
      name: modelmesh-monitoring
    - kustomizeConfig:
        repoRef:
          name: manifests
          path: prometheus/cluster
      name: prometheus-cluster
    - kustomizeConfig:
        repoRef:
          name: manifests
          path: prometheus/operator
      name: prometheus-operator
  repos:
    - name: manifests
      uri: https://github.com/opendatahub-io/odh-manifests/tarball/master
    - name: manifests-modelmesh
      uri: https://github.com/opendatahub-io/modelmesh-serving/tarball/main

Dashboard steps

Create a project
Create a runtime
Create a model (DM me if you need one, mino used to work, might be able to try that - slack link for setup steps)
Fire requests off to the model's route

Test Impact

At this time, no tests are being done. Will need to look at what we can really test here -- it's all dependant on graph values -- and testing the graphs themselves is more like testing Victory. Not sure we have a Storybook test. Might have unit tests, I'll look to see if any utilities could benefit from it.

Request review criteria:

The commits have meaningful messages (squashes happen on merge by the bot).
Included any necessary screenshots or gifs if it was a UI change.
Included tags to the UX team if it was a UI/UX change.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work
The developer has added tests or explained why testing cannot be added (unit tests & storybook for related changes)

openshift-ci · 2023-03-17T18:46:54Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andrewballantyne. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

andrewballantyne · 2023-03-17T18:48:10Z

@vconzola Don't focus on the graph data -- this PR is not for that. See the breadcrumbs, name, dropdown options for the time range... that kind of stuff. If those are good, we can consider the UX on-track.

Will look to add more effort behind the x-axis and making that stack line chart for a future PR.

vconzola · 2023-03-17T19:38:34Z

A couple small things...
(1) Are the number of requests really (x100)? In other words are the y-axis values the number of requests you're seeing divided by 100? Jeff thought these numbers could be really large and make the y-axis labels look funny so he suggested dividing the number by 100 and adding the "(x100)" to the chart title. But if it's easier to just show the actual number of requests and let Victory handle the y-axis labels, then let's do that.
(2) I see now my mockups aren't consistent (or correct) wrt to breadcrumbs and page titles. Sorry. I think what you've got for the model metric charts looks good, assuming "Sample" is the name of the model. But for the model server metrics the breadcrumb should be "Data science projects > Andrew's test > ovms metrics", where "ovms" is the model server type. For now it will always be "ovms", but later when we support additional server types, like Watson Core Serving, it'll be whatever the type is. (Eventually, we might end up naming servers and we'll make it the name instead of the type, but that won't be until we support KServe.) Similarly, page title should just be "ovms metrics", or whatever the server type is.

andrewballantyne · 2023-03-17T19:42:58Z

A couple small things...
(1) Are the number of requests really (x100)? In other words are the y-axis values the number of requests you're seeing divided by 100? Jeff thought these numbers could be really large and make the y-axis labels look funny so he suggested dividing the number by 100 and adding the "(x100)" to the chart title. But if it's easier to just show the actual number of requests and let Victory handle the y-axis labels, then let's do that.

Not yet -- but they will be -- I have a TODO in the code to divide them. But without real data for these things, it's hard to test.

(2) I see now my mockups aren't consistent (or correct) wrt to breadcrumbs and page titles. Sorry. I think what you've got for the model metric charts looks good, assuming "Sample" is the name of the model. But for the model server metrics the breadcrumb should be "Data science projects > Andrew's test > ovms metrics", where "ovms" is the model server type. For now it will always be "ovms", but later when we support additional server types, like Watson Core Serving, it'll be whatever the type is. (Eventually, we might end up naming servers and we'll make it the name instead of the type, but that won't be until we support KServe.) Similarly, page title should just be "ovms metrics", or whatever the server type is.

Understood, I'll look to update.

alexcreasy

I've been through the code, nothing really caught my eye as being obviously wrong - if that's worth anything 😆

It has been very useful for me to see what areas of the codebase you're changing for metrics / what components you're using however.

So far I haven't been able to get your KFDef working in OSD, I'll have another crack at that tomorrow

frontend/src/pages/modelServing/screens/types.ts

andrewballantyne

Noticed a couple refactor typos to correct 😓 Whoops.

andrewballantyne · 2023-03-21T16:06:19Z

frontend/src/api/prometheus/serving.ts

  timeframe: TimeframeTitle,
  lastUpdateTime: number,
  setLastUpdateTime: (time: number) => void,
 ): {
-  data: Record<ModelServingMetricType, ContextResourceData<PrometheusQueryRangeResultValue>>;
+  data: Record<RuntimeMetricType, ContextResourceData<PrometheusQueryRangeResultValue>>;


Typo -- needs to have both types; sad TypeScript didn't catch this during my refactoring.

andrewballantyne · 2023-03-21T16:13:02Z

frontend/src/pages/modelServing/screens/projects/ProjectRuntimeMetricsWrapper.tsx

+import RuntimeGraphs from '~/pages/modelServing/screens/metrics/RuntimeGraphs';
+import { MetricType } from '~/pages/modelServing/screens/types';
+
+const ProjectInferenceMetricsWrapper: React.FC = () => {


lucferbux · 2023-03-21T16:19:03Z

frontend/src/pages/modelServing/screens/projects/ProjectRuntimeMetricsWrapper.tsx

+  return (
+    <ModelServingMetricsProvider queries={queries} type={MetricType.RUNTIME}>
+      <MetricsPage
+        title={`ovm metrics`}


This is a fixed label but we are currently supporting custom runtimes, it might be weird having a different runtime and displaying ovms. And I think is ovms (OpenVINO Model Server).

I understand it's fixed, this will need to be adjusted when we get the runtime proper changes...

cc @vconzola please address the comment about the name -- I was given this "ovm" concept from you. I can definitely expand it -- I just didn't know what OVM was 😛

@andrewballantyne As I mentioned in my comment above, "ovms" is the model server type - OpenVINO Model Server, which comes from the Type column of the model server table. Currently that's all we support. But as we add support for Watson Core Serving, and other runtimes this value will change.

But for the model server metrics the breadcrumb should be "Data science projects > Andrew's test > ovms metrics", where "ovms" is the model server type

@vconzola you explicitly said ovm... do we want to expand it to the fully name? That's fine, just need clarification. Do we say OpenVINO Model Serving anywhere? 🤔

And yes, I am aware it is fixed text... but unless we have all of this text floating around in objects (I don't think we do) it's hardcoded until we support custom runtimes

"ovms" is what shows up in the model server table Type column. I'm not sure where that text comes from because the user currently doesn't select a runtime. It must come the backend someplace. I want what's in the chart title to match what's in the table so the user can make a 1-1 connection. Just FYI, once we support multiple servers (which should be next sprint, I think) the "Type" is going to be replace by a "Name", so all this will change. I'll show what I mean in the UX meeting tomorrow.

The ServingRuntime object has a name attribute, the default one has ovms for example. I think we can dynamically fetch that name since it can change depending on the custom runtime installed.

lucferbux · 2023-03-21T16:19:23Z

frontend/src/pages/modelServing/screens/projects/ProjectRuntimeMetricsWrapper.tsx

+            link: `/projects/${currentProject.metadata.name}`,
+          },
+          {
+            label: `ovm metrics`,


Same with this label.

andrewballantyne · 2023-03-21T17:51:09Z

Will adjust the PR tomorrow -- might even get some queries as well 🎉

andrewballantyne · 2023-03-27T12:33:12Z

Going to merge this into the feature branch -- we can swing back to the comments pending and the right queries. I've made notes on the ticket so the conversations are not lost.

Adding labels manually so @alexcreasy can work off this refactor work.

andrewballantyne · 2023-03-27T12:33:34Z

Going to merge this into the feature branch -- we can swing back to the comments pending and the right queries. I've made notes on the ticket so the conversations are not lost.

Adding labels manually so @alexcreasy can work off this refactor work.

andrewballantyne · 2023-03-27T12:34:24Z

Going to merge this into the feature branch -- we can swing back to the comments pending and the right queries. I've made notes on the ticket so the conversations are not lost.

Adding labels manually so @alexcreasy can work off this refactor work.

openshift-ci · 2023-03-27T12:42:43Z

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Rework Inference Metrics & Add Runtime Metrics (invalid quieries) * Add stacked line chart functionality

* Re-enable Metrics * Fix Graphs & Configure Infrastructure (#1023) * Rework Inference Metrics & Add Runtime Metrics (invalid quieries) * Add stacked line chart functionality * Trustyai demo phase0 (#1093) * Explainability: Fairness and Bias Metrics (Phase 0) (#1001) (#1006) (#1007) (#1008) - Initial feature set for TrustyAI related UI functionality - Adds tab based navigation to modelServing screen - Adds a bias metrics tab with charts for visualising SPD and DIR metrics - Enhances prometheus query features for accessing TrustyAI data - Enhacements to MetricsChart component making it more configurable * Update key of request name to match trusty backend * Remove unnecessary div and inline style from tooltip * Remove 15 minutes refresh option * Prefer optional prop to type union with undefined * Move function definitions inline * Prefer narrowing over type conversion * Inline tab change handler * Remove toolbar option from ApplicationsPage * Inline domain calculator functions * Move defaultDomainCalculator to utils * Return null instead of undefined * Use threshold label instead of index for key * Add enum for tab keys * Remove magic numbers from domain calculations * Make ResponsePredicate mandatory and add predicate to useQueryRangeResourceData * TrustyAI Client (#1318) * Add support for insecure http requests in development mode * Adds low level API client for TrustyAI service * Adds TrustyAI high level API and contexts * Get scheme of TrustyAI route from k8s data * Add model bias configuration table (#1290) * Add model bias configuration table * rebase and remove mock data * Update Trusty AI client to handle API changes (#1336) (#1337) * Add bias metrics configuration modal (#1343) * Add configuration modal * address comments * get rid of some TODOs and refine the route * Multi-metric display on model bias screen (#1273) (#1349) * Enhancements to model bias screen * Display of multiple bias charts simultaneously * Multi-select component, allowing free text, or select-from-list selection of chartst to display * Ability to collapse / expand individual charts * User selectable refresh rates of chart data * Chart selection and open / closed status is persisted to session cache for life of user's browser session Display user defined threshold values on charts (#1163) * Clean up of bias chart logic * Displays thresholds chosen by user, or defaults if none. * Improves domain and threshold calculation based on user values or defaults * Fix metrics submission issue and handle errors (#1378) * Fix metrics submission issue and handle errors * fix lint issue * use error handler on GET functions * Default and restrict threshold, add tooltips, default duplicate name and set feature flag (#1390) * Default and restrict threshold, add tooltips, default duplicate name and set feature flag * fix lint * add tooltips and dropdown descriptions * clear data when closing configuration modal * really solve deleting issue, make empty table view a common component and apply it everywhere * address comments * Minor enhancements to bias chart (#1386) (#1399) * Adds refresh interval options that match openshift observability dashboard * Show first chart from list, if none selected when user first navigates to bias tab * Use search icon instead of plus for nothing selected empty state * Fix error with calculation of 30 days constant * Deleted charts are removed from session storage * Fixes issue with bias charts auto-refreshing with stale data (#1403) (#1404) * Refactor prometheus queries to remove duplication * Fix graph not refreshing issue * Add code review suggestions * Add performance metrics feature flag and refactor runtime server route (#1413) * Add performance metrics feature flag, refactor runtime server route and solve layout issues * revert some style changes * Model serving metrics renaming (#1421) * Adds support for TrustyAI Operator (#1443) * Adds support for TrustyAI Operator (#1276) * Changes from feedback --------- Co-authored-by: Andrew Ballantyne <[email protected]> Co-authored-by: Andrew Ballantyne <[email protected]> Co-authored-by: Alex Creasy <[email protected]> Co-authored-by: Alex Creasy <[email protected]>

openshift-ci bot requested review from DaoDaoNoCode and lucferbux March 17, 2023 18:46

andrewballantyne changed the title ~~Fix metrics~~ Fix Graphs & Configure Infrastructure Mar 17, 2023

andrewballantyne added 2 commits March 18, 2023 12:20

Rework Inference Metrics & Add Runtime Metrics (invalid quieries)

b5a3459

Add stacked line chart functionality

36ba734

andrewballantyne force-pushed the fix-metrics branch from d5c8ad7 to 36ba734 Compare March 18, 2023 16:20

andrewballantyne requested a review from alexcreasy March 18, 2023 16:22

alexcreasy reviewed Mar 20, 2023

View reviewed changes

frontend/src/pages/modelServing/screens/types.ts Show resolved Hide resolved

andrewballantyne commented Mar 21, 2023

View reviewed changes

lucferbux requested changes Mar 21, 2023

View reviewed changes

openshift-ci bot assigned lucferbux Mar 21, 2023

andrewballantyne mentioned this pull request Mar 27, 2023

[Feature Request]: Readdress our needs for Model Serving Metrics #1022

Closed

andrewballantyne added lgtm approved labels Mar 27, 2023

openshift-merge-robot merged commit d9e950c into opendatahub-io:f/mserving-metrics Mar 27, 2023

alexcreasy pushed a commit to alexcreasy/odh-dashboard that referenced this pull request Mar 29, 2023

Fix Graphs & Configure Infrastructure (opendatahub-io#1023)

3952fdf

* Rework Inference Metrics & Add Runtime Metrics (invalid quieries) * Add stacked line chart functionality

andrewballantyne linked an issue Apr 14, 2023 that may be closed by this pull request

[Feature Request]: Readdress our needs for Model Serving Metrics #1022

Closed

andrewballantyne mentioned this pull request Apr 14, 2023

Metrics QoL Fixes #1132

Closed

andrewballantyne deleted the fix-metrics branch June 26, 2024 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Graphs & Configure Infrastructure #1023

Fix Graphs & Configure Infrastructure #1023

andrewballantyne commented Mar 17, 2023 •

edited

Loading

openshift-ci bot commented Mar 17, 2023

andrewballantyne commented Mar 17, 2023

vconzola commented Mar 17, 2023

andrewballantyne commented Mar 17, 2023

alexcreasy left a comment •

edited

Loading

andrewballantyne left a comment

andrewballantyne Mar 21, 2023

andrewballantyne Mar 21, 2023

lucferbux Mar 21, 2023

andrewballantyne Mar 21, 2023

vconzola Mar 22, 2023

andrewballantyne Mar 22, 2023 •

edited

Loading

vconzola Mar 22, 2023

lucferbux Mar 23, 2023

lucferbux Mar 21, 2023

andrewballantyne commented Mar 21, 2023

andrewballantyne commented Mar 27, 2023

andrewballantyne commented Mar 27, 2023

andrewballantyne commented Mar 27, 2023

openshift-ci bot commented Mar 27, 2023

Fix Graphs & Configure Infrastructure #1023

Fix Graphs & Configure Infrastructure #1023

Conversation

andrewballantyne commented Mar 17, 2023 • edited Loading

Description

----Queries do not represent data right now----

How Has This Been Tested?

KFDef

Dashboard steps

Test Impact

Request review criteria:

openshift-ci bot commented Mar 17, 2023

andrewballantyne commented Mar 17, 2023

vconzola commented Mar 17, 2023

andrewballantyne commented Mar 17, 2023

alexcreasy left a comment • edited Loading

Choose a reason for hiding this comment

andrewballantyne left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewballantyne Mar 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewballantyne commented Mar 21, 2023

andrewballantyne commented Mar 27, 2023

andrewballantyne commented Mar 27, 2023

andrewballantyne commented Mar 27, 2023

openshift-ci bot commented Mar 27, 2023

andrewballantyne commented Mar 17, 2023 •

edited

Loading

alexcreasy left a comment •

edited

Loading

andrewballantyne Mar 22, 2023 •

edited

Loading