Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mandalorian/Phoenix/Titans] - ENABLER - produktionsmoden implementering af integration events #462

Open
11 tasks
krmoos opened this issue Jun 29, 2023 · 5 comments
Assignees

Comments

@krmoos
Copy link
Contributor

krmoos commented Jun 29, 2023

Synopsis

As any DH3 stakeholder
I want a production-grade implementation of the event-driven design
So that business processes don't get stuck or go haywire
And monitor in order to detect problems early
And allow developers to quickly identify problems and make the system recover fast

Notes:

  • The feature might depend on a production-grade implementation of monitoring/logging/surveillance/alarms
  • A similar feature is required for point-to-point communication

Acceptance Criteria

  • It is known and implemented how to handle dead letters
  • Suitable logging/monitoring/alarms have been implemented in order to detect problems or anomalies early
  • Product teams use a shared DH3 platform (NuGet packages?) to publish and subscribe to integration events
  • The platform supports effectively-once delivery
  • The platform is resilient to service bus downtime or failures
  • The platform meets the requirements (performance, message size, ...) of wholesale calculation result publishing
  • The platform and domains support the intentions of ADR-008
  • The platform support the NFRs (what are they?)

Tech. Notes

See the product teams initative in Confluence.

Testability

  • Can be tested?
  • Can be demoed?
  • Verified by UX

How to testEnviroment:User:Senario:

@rvplauborg
Copy link

rvplauborg commented Aug 17, 2023

Bjarke pointed me to this epic for posting a few thoughts on observability:
We should ensure that we set up the tracing, so we can trace across services even when communication between them is done via asynchronous events and not synchronous http calls. This means that events should carry with them activity id and similar tracing attributes.
Also, probably pretty important to collect metrics on stuff like messages in queues or throughput to discover if queues are growing and consumers cannot keep up.

@MadsDue
Copy link

MadsDue commented Aug 18, 2023

Agree with @rvplauborg, we have done this preciously by ensuring that we track a correlation id across all domains for same "action".

It makes it easier to identify the error in the logs + see all events leading up to the error for the specific action.

@BjarkeMeier
Copy link
Contributor

@rvplauborg and @MadsDue, I'm not sure how the architects want to carve out these features. But I've just now created another one for tracing and diagnostics logging.
https://app.zenhub.com/workspaces/epic-board-6375df2fd6f08e0015e1e0e6/issues/gh/energinet-datahub/green-energy-hub/489

@mogensjuul
Copy link

@krmoos @rvplauborg
Jeg har ikke nogen ide om hvor status er på denne her. Kan I hjælpe?

@rvplauborg
Copy link

rvplauborg commented Oct 16, 2023

Hej @mogensjuul. Jeg er ikke rigtig inde over denne opgave, ud over den ene kommentar jeg skrev omkring observability, så må være dig svar skyldig..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants