Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Camunda Process Monitoring #1229

Closed
2 tasks
simonhir opened this issue Jan 17, 2024 · 7 comments · Fixed by #1306
Closed
2 tasks

Camunda Process Monitoring #1229

simonhir opened this issue Jan 17, 2024 · 7 comments · Fixed by #1306
Assignees
Labels
enhancement New feature or request internal

Comments

@simonhir
Copy link
Member

simonhir commented Jan 17, 2024

Is your feature request related to a problem? Please describe.

As provider of the digiwf platform i want to know if after a release all processes work fine. For that it would be great to have a overview of started/running/completed and failed process instances.
As a process developer i want to know if a release (digiwf or process definition) broke one of my processes and be notified if incidents are increasing.

Describe the solution you'd like

Graphical overview of started/running/completed/failed process instances (complete cluster and per process). A good solution for that would be grafana and using the camunda provided prometheus endpoint.
Also there should be the possibility for notifications if incidents are increasing fast or 100% is failing. (Also possible with grafana)

  • Prometheus Metrics are broken since SB3
    • Maybe missing Deps or Security
  • Grafana is already deployed but doesn't get any data
  • Configure Alerting in Grafana

Describe alternatives you've considered

There are no real alternatives. Other monitoring solutions like appd would only detect failures calling integrations not logical or internal script errors. These monitoring tools would be more of a extension for deeper insight.

Acceptance Criteria

  • Failing process instances are monitored and could be viewed graphically
  • Alerts are sent if failing instances increase (a good threshold logic needs to be found)

Additional context

@darenegade
Copy link
Member

@darenegade
Copy link
Member

Please add your planning poker estimate with Zenhub @dominikhorn93

@dominikhorn93 dominikhorn93 self-assigned this Feb 2, 2024
@dominikhorn93
Copy link
Contributor

auf der demo Umgebung funktioniert noch alles: https://grafana-route-digiwf-demo-capmanaged.apps.capk.muenchen.de/d/rjaygWhnk/camunda-dashboard?orgId=1

Es liegt hier vermutlich daran, dass es von der alten REST-API gezogen wurde und die nur noch in den Optimize Umgebungen läuft.

@darenegade
Copy link
Member

Entscheidungs-Doku: Das Dashboard wird auch nur auf vollwertigen Umgebungen inklusive Optimize benötigt. Daher muss hier nichts gefixed werden, nur Doku angepasst mit den Links.

Dashboardanpassung und Alert ist weiterhin Teil des Tickets

@zambrovski
Copy link
Contributor

Wir messen aber deutlich zu wenig. Ich werde weitere Monitore ergänzen und dann schauen wir weiter.

@zambrovski zambrovski self-assigned this Feb 8, 2024
@zambrovski zambrovski mentioned this issue Feb 15, 2024
8 tasks
@darenegade
Copy link
Member

@simonhir schaut noch, ob das auf den Umgebungen alles läuft. Dann wird das hier geschlossen

@simonhir
Copy link
Member Author

Funktioniert, weiter Analyse und Anpassungen im Rahmen von #1251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants