Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defensive measures #56

Merged
merged 14 commits into from
Aug 16, 2023
Merged

Defensive measures #56

merged 14 commits into from
Aug 16, 2023

Conversation

tkiapril
Copy link
Contributor

@tkiapril tkiapril commented Aug 11, 2023

There has been quite an amount of changes that had to be make, so unfortunately this PR is a bit big.
What it changes:

  • The emitter no longer pulls the event to be emitted from the DB, instead it uses the payload from the event queue.
  • Finalization is taken care of in observer, and now only events that has been finalized is fetched, instead of fetching events in latest blocks and watching them for finalization.
  • The event sources are now protected with a mutex, so that it does not change during finalization.
  • blockTimestamp is removed from schema, observedAt is renamed to observedTimestamp, and blockNumber_logIndex is used for PK for events.
  • Failed webhook posts are retried, and they are saved to DB in case emitter dies while retrying so that only failed URLs are retried even after restart.
  • Lock is implemented for re-publishing of stale events from emitter outage. Emitter locks an event while processing, and observer watches the events to see if there is a stale event in which 300 seconds have been passed since the last lock. This ensures stable events are re-published, and events in event queue are not re-published.
  • Observer records finalized block indices and failure thereof. It now uses this information to retry failed and missed blocks.
  • Now events queue is defined as durable, and messages are published as persistent.

Component diagram:

graph LR
classDef TransparentSubgraph fill:transparent,stroke-width:0;
classDef vellip fill:transparent,stroke-width:0
    RPC1[/RPC/] --> Observer1
    RPC2[/RPC/] --> Observer2
    RPC3[/RPC/] --> Observer3
    RPC4[/RPC/] --> Observer4
    RPC5[/RPC/] --> Observer5
    Emitter1 ------ Emitter1Out[ ]
    Emitter2 ------ Emitter2Out[ ]
    Emitter3 ------ Emitter3Out[ ]
    Emitter4 ------ Emitter4Out[ ]
    Emitter5 ------ Emitter5Out[ ]
    style Emitter1Out width:0
    style Emitter2Out width:0
    style Emitter3Out width:0
    style Emitter4Out width:0
    style Emitter5Out width:0
    Emitter1Out & Emitter2Out & Emitter3Out & Emitter4Out & Emitter5Out --> Webhook1 & Webhook2 & Webhook3 & Webhook4 & Webhook5
subgraph Webhooks[ ]
    Webhook1([Webhook])
    Webhook2([Webhook])
    Webhook3([Webhook])
    Webhook4([Webhook])
    vellip2["⋮"]:::vellip
    Webhook5([Webhook])
end
subgraph Dependent[Dependent on other components]
    subgraph Observers[ ]
        Observer1[Observer]
        Observer2[Observer]
        Observer3[Observer]
        Observer4[Observer]
        vellip1["⋮"]:::vellip
        Observer5[Observer]
    end
    class Observers TransparentSubgraph
    subgraph Emitters[ ]
        Emitter1[Emitter]
        Emitter2[Emitter]
        Emitter3[Emitter]
        Emitter4[Emitter]
        vellip3["⋮"]:::vellip
        Emitter5[Emitter]
    end
    class Emitters TransparentSubgraph
    Observer1 --- Observer1Out[ ]
    Observer2 --- Observer2Out[ ]
    Observer3 --- Observer3Out[ ]
    Observer4 --- Observer4Out[ ]
    Observer5 --- Observer5Out[ ]
    style Observer1Out width:0
    style Observer2Out width:0
    style Observer3Out width:0
    style Observer4Out width:0
    style Observer5Out width:0

    Observer1Out & Observer2Out & Observer3Out & Observer4Out & Observer5Out --- AMQPIn[ ] --> AMQP>AMQP Broker Cluster] --- AMQPOut[ ] --> Emitter1 & Emitter2 & Emitter3 & Emitter4 & Emitter5
    style AMQPIn width:0
    style AMQPOut width:0
    style DBIn width:0
end
class Webhooks TransparentSubgraph
subgraph Independent[Independent components]
    API1[API Server]
    API2[API Server]
    API3[API Server]
    API4[API Server]
    vellip4["⋮"]:::vellip
    API5[API Server]
end
style Independent fill:skyblue,stroke:blue
Observer1Out & Observer2Out & Observer3Out & Observer4Out & Observer5Out --- DBIn[ ] -----> DB[(DB Cluster)] --- DBOut[ ] --> API1 & API2 & API3 & API4 & API5
style DBOut width:0
Loading

Test description to see if everything recovers correctly (set finalization offset to 0):

  • Stop RPC, send tx, Start RPC, see if Observer correctly recovers missing blocks
  • Stop RPC, send tx, wait until Observer fails to retrieve block, stop Observer, start RPC, start Observer, see if Observer correctly recovers missing blocks
  • Stop RPC, send tx, wait >300s, Start RPC, see if Observer correctly recovers missing blocks (to see if code -32000 occurs)
  • Stop all Observers, send tx, start Observer, see if Observer correctly recovers missing blocks
  • Stop all Emitters, send tx, start Emitter, see if Emitter correctly emits all events
  • Stop all Emitters, send tx, wait Observer queue, stop Observer, start Emitter, see if Emitter correctly emits all events
  • Stop all Emitters, send tx, wait Observer queue, stop Observer, restart AMQP Broker, start Emitter, see if Emitter correctly emits all events
  • Stop a webhook, send tx, see if emitter retries, start webhook, see if emitter finishes emitting
  • Stop a webhook, send tx, wait emitter retry, stop emitter retrying, start webhook, wait >300s, see if observer re-publishes, see if other emitter finishes emitting
  • Stop a webhook, send tx, wait emitter retry, stop all emitters, start webhook, wait >300s, see if observer re-publishes, start all emitters, see if emitter finishes emitting
  • Stop a webhook, send tx, wait emitter retry, stop all emitters, stop all observers, wait >300s, start emitters, start observers, see if observer re-publishes, see if emitter finishes emitting

Note that although events from RPC should not be lost, there can be instances in extreme conditions where some events are not emitted when AMQP broker dies and the whole system goes down without a chance for the AMQP broker to save the message to persistence.

emitter.ts Outdated Show resolved Hide resolved
emitter.ts Show resolved Hide resolved
... instead of reading from DB.
Emitter will retry posting events for failed webhook destinations.
However, when emitter recovery is implemented, when a event is not
yet successfully emitted and the emitter restarts, the emitter will
emit the message to all the destinations, including those already
emitted successfully.
eseiker
eseiker previously approved these changes Aug 16, 2023
@pull-request-quantifier-deprecated

This PR has 878 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Large
Size       : +528 -350
Percentile : 95.93%

Total files changed: 11

Change summary by file extension:
.ts : +502 -336
.yml : +1 -1
.prisma : +25 -13

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@tkiapril tkiapril merged commit 58a8b50 into planetarium:main Aug 16, 2023
3 checks passed
@tkiapril tkiapril deleted the feat/recover branch August 16, 2023 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants