Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charms to be integrated with grafana #834

Closed
6 of 36 tasks
orfeas-k opened this issue Feb 21, 2024 · 4 comments
Closed
6 of 36 tasks

Charms to be integrated with grafana #834

orfeas-k opened this issue Feb 21, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@orfeas-k
Copy link
Contributor

orfeas-k commented Feb 21, 2024

This is a tracker issue to document progress with CKF charms that can be integrated with Grafana (aka provide a functional and useful grafana dashboard)

Update: There is also MLflow that its dashboard presents No data.

Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5364.

This message was autogenerated

@natalytvinova
Copy link

I can confirm that jupyter notebooks, minio and seldon controller are missing data, while katib-controller and argo-controller work on the current deployment. If we could also include MLflow to that list: MLflow metrics Dashboard is also missing data.

@orfeas-k
Copy link
Contributor Author

orfeas-k commented Mar 6, 2024

MinIO

The issue there is that we have a label job=scrabe_jobs which is making the panel not rendering anything. Also, metrics minio_cluster_capacity is not provided by MinIO (anymore I guess).

orfeas-k added a commit to canonical/envoy-operator that referenced this issue Apr 4, 2024
* Rename relation from `grafana-dashboards` to `grafana-dashboard` in order to:
  * follow what every other CKF charm does.
  * not diverge from the grafana library default.

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/notebook-operators that referenced this issue Apr 5, 2024
* Remove `uid` from the `datasource` fields.
* fix typo
* Add tag `ckf` to dashboard.
* Increase retry_for_attempts to 10 attempts
* unpin prometheus

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/argo-operators that referenced this issue Apr 5, 2024
* Add `ckf` tag to argo-controller's grafana dashboard.
* Fix dashboard panels not working by:
  * Replacing unavailable metrics with available ones
  * Change 2 minutes instead of 1 in places where rate() is used since
    this requires more than one scrape data points.
  * Remove rate() from panel that shows total number of log messages.

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/envoy-operator that referenced this issue Apr 5, 2024
* Add `ckf` tag to the grafana dashboard.
* Fix dashboard panels not working by: 
  * Replacing unavailable metrics with available ones
  * Adding 2 minutes instead of 1 in places where rate() is used since
    this requires more than one scrape data points.
  * Remove rate() from panels that shows percentages.
  * Remove labels where the metrics don't provide them.

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
Closes #73
orfeas-k added a commit to canonical/seldon-core-operator that referenced this issue Apr 5, 2024
* Fix grafana dashboard by removing `uid` from the `datasource` fields.
* Add tags `ckf` and `seldon` to dashboard.

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/katib-operators that referenced this issue Apr 8, 2024
* Add `ckf` and `katib` tag to katib-controller's grafana dashboard.
* Fix `datasource` field
* Fix current experiments & trials panel by:
  * adding a `legendFormat` field
  * converting it to a time series & mapping no value to 0 in order to account
    for the fact that katib-controller doesn't output any `current` metrics when
    there is no experiment or trial running.

Ref canonical/bundle-kubeflow#856
Ref canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/envoy-operator that referenced this issue Apr 9, 2024
* Rename relation from `grafana-dashboards` to `grafana-dashboard` in order to:
  * follow what every other CKF charm does.
  * not diverge from the grafana library default.

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/envoy-operator that referenced this issue Apr 9, 2024
* Add `ckf` tag to the grafana dashboard.
* Fix dashboard panels not working by: 
  * Replacing unavailable metrics with available ones
  * Adding 2 minutes instead of 1 in places where rate() is used since
    this requires more than one scrape data points.
  * Remove rate() from panels that shows percentages.
  * Remove labels where the metrics don't provide them.

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
Closes #73
orfeas-k added a commit to canonical/notebook-operators that referenced this issue Apr 9, 2024
* Remove `uid` from the `datasource` fields.
* fix typo
* Add tag `ckf` to dashboard.
* Increase retry_for_attempts to 10 attempts
* unpin prometheus

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/argo-operators that referenced this issue Apr 9, 2024
* Add `ckf` tag to argo-controller's grafana dashboard.
* Fix dashboard panels not working by:
  * Replacing unavailable metrics with available ones
  * Change 2 minutes instead of 1 in places where rate() is used since
    this requires more than one scrape data points.
  * Remove rate() from panel that shows total number of log messages.

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/seldon-core-operator that referenced this issue Apr 9, 2024
* Fix grafana dashboard by removing `uid` from the `datasource` fields.
* Add tags `ckf` and `seldon` to dashboard.

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/katib-operators that referenced this issue Apr 9, 2024
* Add `ckf` and `katib` tag to katib-controller's grafana dashboard.
* Fix `datasource` field
* Fix current experiments & trials panel by:
  * adding a `legendFormat` field
  * converting it to a time series & mapping no value to 0 in order to account
    for the fact that katib-controller doesn't output any `current` metrics when
    there is no experiment or trial running.

Ref canonical/bundle-kubeflow#856
Ref canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/envoy-operator that referenced this issue Apr 9, 2024
* Add `ckf` tag to the grafana dashboard.
* Fix dashboard panels not working by: 
  * Replacing unavailable metrics with available ones
  * Adding 2 minutes instead of 1 in places where rate() is used since this requires more than one scrape data points.
  * Remove rate() from panels that shows percentages.
  * Remove labels where the metrics don't provide them.

Part of canonical/bundle-kubeflow#856
Ref canonical/bundle-kubeflow#834
Ref #73
orfeas-k added a commit to canonical/envoy-operator that referenced this issue Apr 9, 2024
* Rename relation from `grafana-dashboards` to `grafana-dashboard` in order to:
  * follow what every other CKF charm does.
  * not diverge from the grafana library default.

Ref canonical/bundle-kubeflow#856
Ref canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/minio-operator that referenced this issue Apr 9, 2024
Fix the dashboard:
* Remove `job=scrape_jobs` label from dashboard fields
* Remove panels using metrics that are not provided by the minio workload.
* Reorder panels for filling gaps created by removed panels.

On top of that, it adds a `ckf` tag to the dashboard.

Ref canonical/bundle-kubeflow#856
Ref canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/argo-operators that referenced this issue Apr 9, 2024
* Add `ckf` tag to argo-controller's grafana dashboard.
* Fix dashboard panels not working by:
  * Replacing unavailable metrics with available ones
  * Change 2 minutes instead of 1 in places where rate() is used since
    this requires more than one scrape data points.
  * Remove rate() from panel that shows total number of log messages.

Ref canonical/bundle-kubeflow#856
Ref canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/seldon-core-operator that referenced this issue Apr 9, 2024
* Fix grafana dashboard by removing `uid` from the `datasource` fields.
* Add tags `ckf` and `seldon` to dashboard.

Ref canonical/bundle-kubeflow#856
Ref canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/notebook-operators that referenced this issue Apr 9, 2024
* Remove `uid` from the `datasource` fields.
* fix typo
* Add tag `ckf` to dashboard.
* Increase retry_for_attempts to 10 attempts
* unpin prometheus

Part of canonical/bundle-kubeflow#856
Ref canonical/bundle-kubeflow#834
orfeas-k added a commit to canonical/katib-operators that referenced this issue Apr 9, 2024
…from #172 (#173)

* Add `ckf` and `katib` tag to katib-controller's grafana dashboard.
* Fix `datasource` field
* Fix current experiments & trials panel by:
  * adding a `legendFormat` field
  * converting it to a time series & mapping no value to 0 in order to account
    for the fact that katib-controller doesn't output any `current` metrics when
    there is no experiment or trial running.

Ref canonical/bundle-kubeflow#856
Ref canonical/bundle-kubeflow#834
@orfeas-k
Copy link
Contributor Author

Closing since these are now tracked in spreadsheet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants