-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(webhooks): implement automatic retries for failed webhook deliveries using scheduler #3842
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…event_and_trigger_outgoing_webhook()` function
The visibility of `create_event_and_trigger_outgoing_webhook()` and `trigger_webhook_to_merchant()` functions is now private instead of public.
… of `OutgoingWebhookEventMetric` trait to `get_outgoing_webhook_event_content()`
…ept a `tag` parameter
…ebhook delivery attempt
… when raising analytics event
…ry configuration
…onnectorPTMapping` and `PaymentMethodsPTMapping` types
…d for 2nd retry attempt instead of 1st retry attempt
…in database when retrying webhooks from scheduler
…nse handling code to closures
…tes and disputes webhooks
…urrent resource status
…ng initial delivery attempt
sahkal
previously approved these changes
Feb 28, 2024
hrithikesh026
previously approved these changes
Feb 28, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
SanchithHegde
dismissed stale reviews from hrithikesh026 and sahkal
via
February 29, 2024 11:00
2c91e0f
sahkal
previously approved these changes
Feb 29, 2024
hrithikesh026
previously approved these changes
Feb 29, 2024
sai-harsha-vardhan
previously approved these changes
Mar 1, 2024
Narayanbhat166
previously approved these changes
Mar 1, 2024
SanchithHegde
dismissed stale reviews from Narayanbhat166, sai-harsha-vardhan, hrithikesh026, and sahkal
via
March 3, 2024 13:47
a38212f
Narayanbhat166
approved these changes
Mar 4, 2024
sahkal
approved these changes
Mar 4, 2024
2 tasks
lsampras
approved these changes
Mar 4, 2024
SanchithHegde
removed
the
S-waiting-on-review
Status: This PR has been implemented and needs to be reviewed
label
Mar 4, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-core
Area: Core flows
A-process-tracker
Area: Process tracker
A-webhooks
Area: Webhook flows
C-feature
Category: Feature request or enhancement
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Type of Change
Description
This PR adds support for automatically retrying outgoing webhook deliveries in case of failed deliveries, with the aid of the scheduler (process tracker).
Behavior (Initial Delivery)
The behavior when an outgoing webhook is being sent is:
During the initial webhook delivery attempt, a process tracker task is added before sending the webhook, since we launch the webhook delivery task on a background thread that we have no control over. The scheduled time for the task is determined from configuration (persisted in database/Redis), which has been explained later.
Next, we trigger the webhook to the merchant and an analytics event is raised, as before.
Behavior (Automatic Retry)
During the automatic retry attempt(s) from process tracker, the current information about the resource (about which the webhook is being sent) is fetched from the database and will be included in the webhook payload.
If the status of the resource is different than when the event for the webhook was created (the resource has now transitioned to another status), then the process tracker task is aborted with business status indicating the status mismatch.
When triggering the webhook to the merchant:
Configuring Retry Intervals
The retry intervals for webhook deliveries are determined by runtime configuration (persisted in the database/Redis), using the key
pt_mapping_outgoing_webhooks
. In case the configuration value is not available in storage, a default configuration value is used, which is harcoded in the application:hyperswitch/crates/scheduler/src/consumer/types/process_data.rs
Lines 69 to 89 in 1d3bf5c
In case the default configuration needs to be overridden, it can be done so using the configuration create (or update) endpoint:
This would override the configuration for all merchants. Note that
frequency
andstart_after
are specified in seconds. If it needs to be overridden only for a specific merchant, then thecustom_merchant_mapping
field must have a similar object asdefault_mapping
, keyed by the merchant ID:Motivation and Context
Closes #217.
How did you test it?
As of now, outgoing webhooks are supported for payments, refunds, disputes and mandates. I've extensively tested payments outgoing webhooks for the different cases, and done a sanity testing on disputes and refunds webhooks to verify that they are retried in case the initial delivery attempt fails. Incoming mandates webhooks are integrated for the GoCardless connector, but the integration is broken and I couldn't test mandates webhook retries.
As for simulating failed webhook deliveries, the merchant webhook URL would have to be configured to a URL which does not accept
POST
requests. After a couple of failed retries, the URL can be updated to a valid URL to try out the case where the retried delivery attempt succeeds.Since the hardcoded default retry configuration spans multiple hours, I configured the application to use shorter intervals:
The
process_tracker
table can be queried for relevant record using this query:If the initial delivery attempt is successful, the business status of the process tracker entry is set to
INITIAL_DELIVERY_ATTEMPT_SUCCESSFUL
:If one of the retried delivery attempts are successful, the business status of the process tracker entry is set to
COMPLETED_BY_PT
:If none of the delivery attempts are successful, the business status of the process tracker entry is set to
RETRIES_EXCEEDED
:In my case, the configured
count
was[2,3]
, so the maximum number of retries turns out to be 2 + 3 = 5.As for testing refunds and disputes webhooks, the screenshots are attached below:
Refunds (note the
event_type
andevent_class
fields):Disputes (note the
event_type
andevent_class
fields), the business status isRESOURCE_STATUS_MISMATCH
since the dispute status changed since the time it was created to the time the webhook was being retried:In addition, successful and failed task additions should raise appropriate metrics (
TASKS_ADDED_COUNT
andTASK_ADDITION_FAILURES_COUNT
respectively), and suitable logs are being thrown when delivering webhooks to the merchant fails, and when retry configs are being read from the database.The PR also adds unit tests for a scheduler utility function I refactored, which can be run using the command:
cargo test --package scheduler --lib -- utils::tests
Checklist
cargo +nightly fmt --all
cargo clippy