Garbage collector not working on aws-prod #3975
Labels
bug
buggy, it does not work as expected
High Priority
a totally crucial bug/feature to be fixed asap
Milestone
Is there an existing issue for this?
Current Behavior
Garbage collector for some days only shows error, there are no more "regular" logs and garbage collection seems to not happen. The errors are of this kind:
The graylog queries that can be used to check if this happens are:
container_name:/.*collector.*/ AND NOT "Error in"
--> Display all non-error loglines, referring to "real" garbage collectioncontainer_name:/.*collector.*/ AND "Error in"
--> Display the errors mentionedFurther evidence of garbage collection not working is that in prometheus one can see a s4-lite service running for many days, to observe this use the PromQL querry:
container_memory_usage_bytes{image=~"^.*[.osparc.io].*/simcore/services/dynamic/s4l-core-lite.*$",name=~"dy-sidecar-b57f4e59-13d0-476d-9954-9855adf657b7.*"}
Comparison with the redis keys, that correspond to open browser-tabs or sessions, show that there was no session key for the user that owns the project containing this s4l for some days, so the garbage collector should have kicked in:
redis_key_value{key=~"^user_id=2:.*$"}
Expected Behavior
Garbage collection works
Steps To Reproduce
The GC does not work on aws-prod
Anything else?
This affects production and may cause it to not run smooth if services accumulate. From my feelings, I would put this on high urgency.
The text was updated successfully, but these errors were encountered: