Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collector not working on aws-prod #3975

Closed
1 task done
mrnicegyu11 opened this issue Mar 15, 2023 · 1 comment · Fixed by #3985
Closed
1 task done

Garbage collector not working on aws-prod #3975

mrnicegyu11 opened this issue Mar 15, 2023 · 1 comment · Fixed by #3985
Assignees
Labels
bug buggy, it does not work as expected High Priority a totally crucial bug/feature to be fixed asap
Milestone

Comments

@mrnicegyu11
Copy link
Member

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Garbage collector for some days only shows error, there are no more "regular" logs and garbage collection seems to not happen. The errors are of this kind:

WARNING: [2023-03-15 12:26:49,577/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 1-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1887d8ec0>: 'f2d26379-e6fc-50dd-956a-3f4f67d2542c'
WARNING: [2023-03-15 12:26:49,577/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 2-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1886863c0>: '843fbe7b-2e50-56b3-9ad9-752de771bf21'
WARNING: [2023-03-15 12:26:49,577/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 3-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa188e67ac0>: '41d7bcb2-af42-5104-b662-5c66e747bbf4'
WARNING: [2023-03-15 12:26:49,577/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 4-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1887de0c0>: '67c34fc6-fa9f-5eaf-bc0d-8012117707cc'
WARNING: [2023-03-15 12:26:49,577/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 5-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1887de8c0>: 'b57f4e59-13d0-476d-9954-9855adf657b7'
WARNING: [2023-03-15 12:26:49,577/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 6-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa188b57c40>: 'fd123ae9-3242-5eb1-bf02-c04b942f2992'
WARNING: [2023-03-15 12:26:49,577/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 7-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa188b57e40>: '7e135c19-c89d-5081-bb90-d07ee9d3dc26'
WARNING: [2023-03-15 12:26:49,577/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 14-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1885ce4c0>: '0c417ffb-8d03-4b68-9ead-dbef12a4af86'
WARNING: [2023-03-15 12:27:21,783/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 1-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1887d8ec0>: 'f2d26379-e6fc-50dd-956a-3f4f67d2542c'
WARNING: [2023-03-15 12:27:21,783/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 2-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa188b57e40>: '843fbe7b-2e50-56b3-9ad9-752de771bf21'
WARNING: [2023-03-15 12:27:21,783/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 3-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa18879eb40>: '41d7bcb2-af42-5104-b662-5c66e747bbf4'
WARNING: [2023-03-15 12:27:21,783/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 4-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1887de8c0>: '67c34fc6-fa9f-5eaf-bc0d-8012117707cc'
WARNING: [2023-03-15 12:27:21,783/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 5-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1887de0c0>: 'b57f4e59-13d0-476d-9954-9855adf657b7'
WARNING: [2023-03-15 12:27:21,783/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 6-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1886073c0>: 'fd123ae9-3242-5eb1-bf02-c04b942f2992'
WARNING: [2023-03-15 12:27:21,783/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 7-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa1886372c0>: '7e135c19-c89d-5081-bb90-d07ee9d3dc26'
WARNING: [2023-03-15 12:27:21,783/MainProcess] [servicelib.utils:logged_gather(122)]  -  Error in 14-th concurrent task <coroutine object _remove_single_orphaned_service at 0x7fa18872e940>: '0c417ffb-8d03-4b68-9ead-dbef12a4af86'

The graylog queries that can be used to check if this happens are:

  • container_name:/.*collector.*/ AND NOT "Error in" --> Display all non-error loglines, referring to "real" garbage collection
  • container_name:/.*collector.*/ AND "Error in" --> Display the errors mentioned

Further evidence of garbage collection not working is that in prometheus one can see a s4-lite service running for many days, to observe this use the PromQL querry:
container_memory_usage_bytes{image=~"^.*[.osparc.io].*/simcore/services/dynamic/s4l-core-lite.*$",name=~"dy-sidecar-b57f4e59-13d0-476d-9954-9855adf657b7.*"}
Comparison with the redis keys, that correspond to open browser-tabs or sessions, show that there was no session key for the user that owns the project containing this s4l for some days, so the garbage collector should have kicked in:
redis_key_value{key=~"^user_id=2:.*$"}

Expected Behavior

Garbage collection works

Steps To Reproduce

The GC does not work on aws-prod

Anything else?

This affects production and may cause it to not run smooth if services accumulate. From my feelings, I would put this on high urgency.

@mrnicegyu11 mrnicegyu11 added bug buggy, it does not work as expected High Priority a totally crucial bug/feature to be fixed asap labels Mar 15, 2023
@mrnicegyu11 mrnicegyu11 added this to the Mithril milestone Mar 15, 2023
@mrnicegyu11
Copy link
Member Author

CC @mguidon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug buggy, it does not work as expected High Priority a totally crucial bug/feature to be fixed asap
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants