-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relay Server: Not Enough Memory on Health Check even though stats show otherwise #3330
Comments
Duplicate of #3327 |
I think it's a memory leak. |
What I can confirm is the fact, that doing the upgrade to 24.9.0 did NOT fix the issue.
error in the docker compose logs. |
What does the event volume look like for you? Did this start happening after upgrading to 24.8.0? |
Could you track your RAM/CPU usage as well? Wondering if there is a correlation there. |
I can also see errors related to getsentry/snuba#5707 in my logs
|
I will have to try and get some system metric info stats running to give you the requested info |
Hey @LordSimal can you try this: On your relay:
upstream: "http://web:9000/"
host: 0.0.0.0
port: 3000
logging:
level: WARN
processing:
enabled: true
kafka_config:
- {name: "bootstrap.servers", value: "kafka:9092"}
- {name: "message.max.bytes", value: 50000000} # 50MB
redis: redis://redis:6379
geoip_path: "/geoip/GeoLite2-City.mmdb"
health:
max_memory_percent: 1.0 Then do Thanks to @Dav1dde |
Interesting, If you don't mind could you elaborate the changes? is it the same with setup a resource limits to each container ? |
Adjusted the Nothing changed till now even though there are 100% events which should be coming in. Should I try to just run |
Its the kafka process
|
I got some news.... I just executed
and suddenly the stats page has updated and there seems to be events present which were not previously... Also events are being processed right now and my sever is pinned at a 100% usage Seems like something prevented the queue worker to process the queued events. After around ~15 minutes all queued up events seem to have been processed and now the load is normal again. Also new sentry events are coming up pretty much instantly in the UI as they have been in the past. Will wait now if the problem re-occurs again. |
Yeah |
will try that the next time it breaks. I already restarted the whole thing. |
It's just my two cents but it's look like the max open port issue is the root cause. Because it seems issue in one of the services that too slow in processing request or the request it self is too much. Because anything can't be processed and accumulating end up all resource like memory and cpu being 100% used. Maybe any method or way to throttle snuba? |
@LordSimal the stats is being handled by the snuba outcomes billing consumer, here: self-hosted/docker-compose.yml Lines 227 to 229 in 036f6d4
Did you see any errors or anything weird coming from that specific container? |
Seems like it panicked |
Today 6AM (10h ago), events have stopped coming in again. Here again just for consistency the logs of all containers from the last 12h. I restarted the kafka container via |
But what DOES fix the problem is just simply restarting all containers via
this of course won't help you understand the root cause of this problem but I don't know what other information I can provide to debug this problem. |
@LordSimal I saw these on the logs:
Since Found this issue, but this is specifically for Windows: docker/for-win#8861 |
We use Debian 12 with Docker
with this official docker repository
|
Something is definitely wrong with the docker internal network. Even doing a simple
I had to restart the whole docker service via |
We have the same problem. Restarting the docker daemon helps. But this is not a good solution right now.
|
Just to be extra sure its not a RAM issue - we just upgraded from 32GB to 64GB but this problem still persists. Sentry just always takes half of the available RAM as a base, no matter how much you have. But looking at the previous comments this indeed seems like a docker internal network problem.
|
Hey @LordSimal, did you configure a low swappiness setting? I encountered a similar issue running Sentry on a virtualized machine when the swappiness was set to 10. Changing it back to the default value (60) resolved the problem for now—though I’ll continue monitoring it. |
|
Just to inform you: Sentry doesn't receive messages after 1-2 days - regularly, therefore I just added a crontab to automatically restart all containers at 02:00 in the morning to prevent this kind of issue from happening. Will try to disable that automatic restart after the next self-hosted update, whenever that releases. |
So we have been using 24.10.0 for 1 week now and it seems to be more stable. I think we restarted the containers once manually due to no events being processed anymore but its not as frequent as it used to be. I'll close this issue since it didn't seem to be that widespread, so it may be something related to our setup/network. Thanks to all participating and trying to find this weird bug 👍🏻 hopefully it doesn't return anytime soon. |
I actually experience the same issues (even using 24.10.0). Setting
fixes it. But I guess that's more of a workaround than a real solution... |
Self-Hosted Version
24.8.0
CPU Architecture
x86_64
Docker Version
27.2.1
Docker Compose Version
2.29.2
Steps to Reproduce
Can't really tell how to reproduce, since it just happens out of nowhere.
Expected Result
Sentry receives errors again
Actual Result
Sentry stops receiving errors after 1-2 days of normal usage.
Checking the docker logs there are a lot of these entries present:
but checking either htop we have enough RAM
as well as checking docker container stats there are no containers using > 95% RAM
docker_compose_logs.txt
latest_install_logs.txt
Event ID
No response
The text was updated successfully, but these errors were encountered: