-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Svix worker does not work as expected #1511
Comments
Basically the background worker that sends messages to endpoints just stopped working with no signs of failure, and after restarting the worker, it is repeatedly crashed due to some corrupt data. |
Team is looking! |
Are the errors preventing you from processing other messages in the queue? My expectation is that other items should still get processed even if some are failing due to the error you reported. Note that in our more recent versions, we have support for a Redis DLQ that can allow sending repeatedly failing messages to a deadletter queue. This might help clear up the errors you're facing. |
Here I have 2 problems. The latter one, that is, I can reproduce this error locally:
To my understanding, svix of my version doesn’t check for null endpoints before mapping with messagedestination, that results in null inserts into database. This behavior was fixed from v.1.4.12 |
The error violating db constraint doesn’t prevent my svix from processing other messages, it’s just noisy, cpu and memory consuming. Do you have any suggestion to find why svix worker stopped working unexpectedly without any sign of failure? I created a message and it didn’t send to my endpoint until the container was restarted. |
No, other messages are handled properly when the worker is restarted. We're able to reproduce the error by doing the following steps:
We're also able to pinpoint the problematic code that causes the problem in version 0.74 and are able to handle the failed tasks by undeleting the endpoint. Now, the remaining mystery here is how the worker just suddenly stop working with no sign of errors, which is troublesome since it is the vital part that actually sends messages to our customer apis. We're planning to upgrade to a newer version of Svix to see if this problem reappears, and we have following questions:
Thank you for your help. |
Just double-check release notes, which I believe you have already done.
I would upgrade to the latest version, though of course you should test in a non-production environment prior to deploying.
What we generally do in our own environments is run the API server inside of the worker container as well and probe the /health endpoint. This endpoint will do basic checks of queue, database, and cache health. Beyond that, we publish some basic OTEL metrics that monitor depth of the various queues. You can monitor the
The DLQ feature is enabled by default in recent Svix versions and will move messages to a queue with a default name of Otherwise, you'll need to manually inspect messages in this queue to try and troubleshoot why they failed. |
Also,thanks for the repro! I'm able to reproduce this in our latest code and can confirm this is still an issue. We will have a fix shortly 🤞 |
Ensure that we don't try to process a message for which endpoints no longer exist. Fixes svix#1511.
@tu-pm We have released version 1.40.0 to address the issue: https://github.com/svix/svix-webhooks/releases/tag/v1.40.0 |
Bug Report
Version
V0.74
Platform
Ubuntu 20.04
Description
We run svix worker and api mode within one container, using RedisCluster and Postgres.
Once day, its worker stopped working despite still receiving API requests:
event sent >
that put Task in queueevent recv <
.As a result, messages have not been delivered to our clients.
We managed to make Svix work again by restarting container, messages can be delivered to clients but there’s another problem.
Svix continuously shows
Checking database logs, it shows
When looking at source code, I find that there is only one place that insert into messagedestination table
process_task
. I think the variabledestinations
mapped fromendpoints
could be null.I hope to get help with some questions:
The text was updated successfully, but these errors were encountered: