Cluster duplicate handler dying from RuntimeError due to dict change while iterating it #160

michaelweiser · 2020-06-10T10:06:16Z

The following traceback has been seen during shutdown at least once with v2.0:

peekaboo[984]: peekaboo.queuing - (Worker-9) - INFO - Worker 9: Stopped
peekaboo[984]: peekaboo.queuing - (MainThread) - DEBUG - 1: 32 workers still running
peekaboo[984]: peekaboo.db - (MainThread) - DEBUG - Clearing database of all in-flight samples of instance 5010.
peekaboo[984]: peekaboo.db - (MainThread) - DEBUG - Clearing database of all stale in-flight samples (900 seconds)
peekaboo[984]: Exception in thread ClusterDuplicateHandler:
peekaboo[984]: Traceback (most recent call last):
peekaboo[984]:   File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
peekaboo[984]:     self.run()
peekaboo[984]:   File "/opt/peekaboo/local/lib/python3.6/site-packages/peekaboo/queuing.py", line 327, in run
peekaboo[984]:     self.job_queue.submit_cluster_duplicates()
peekaboo[984]:   File "/opt/peekaboo/local/lib/python3.6/site-packages/peekaboo/queuing.py", line 168, in submit_cluster_duplicates
peekaboo[984]:     for sample_hash, sample_duplicates in self.cluster_duplicates.items():
peekaboo[984]: RuntimeError: dictionary changed size during iteration
peekaboo[984]: peekaboo.daemon - (MainThread) - DEBUG - Removing PID file /var/run/peekaboo/peekaboo.pid
systemd[1]: Stopped Peekaboo Extended Email Attachment Behavior Observation Owl.

There seems to be some kind of a race in Queue.shut_down() between cluster duplicate handler and queue shutdown which is odd because duplicate handler shutdown is the very first thing triggered, so it should not do another cleanup run while the queue is shutting down workers.

The text was updated successfully, but these errors were encountered:

michaelweiser · 2020-06-10T10:35:34Z

This also happens during normal operation and is due to the changed implementation of dict.items() as a view in python 3 which does not respond well to changes to the dictionary during iteration.

michaelweiser · 2020-06-10T10:40:09Z

In our case the cluster duplicate handler thread seems to die mid-operation due to the runtime exception but the backtrace is only logged at shutdown. (May be also a thing of caching/delaying stderr in systemd - but it was quite a long time in the one case I observed. When run interactively from the command line the backtrace appears immediately.)

With python3 the cluster duplicate handler would die from RuntimeErrors due to the items() accessor of the duplicate backlog dict being a view/iterator that doesn't respond well to the dict changing while being iterated. Prevent the RuntimeError by iterating over the items of a copy of the dict while changing the original, similar to what we're doing in the cuckoo job tracke for alomst the same reason already. Fixes scVENUS#160.

With python3 the cluster duplicate handler would die from RuntimeErrors due to the items() accessor of the duplicate backlog dict being a view/iterator that doesn't respond well to the dict changing while being iterated. Prevent the RuntimeError by iterating over the items of a copy of the dict while changing the original, similar to what we're doing in the cuckoo job tracke for alomst the same reason already. Fixes #160.

michaelweiser added the bug label Jun 10, 2020

michaelweiser added this to the 2.1 milestone Jun 10, 2020

michaelweiser self-assigned this Jun 10, 2020

michaelweiser changed the title ~~Cluster duplicate backlog can be corrupted~~ Cluster duplicate handler exception on shutdown Jun 10, 2020

This was referenced Jun 10, 2020

queueing: Fix cluster duplicate handler dying #161

Merged

queueing: Fix cluster duplicate handler dying #162

Merged

michaelweiser changed the title ~~Cluster duplicate handler exception on shutdown~~ Cluster duplicate handler dying from RuntimeError due to dict change while iterating it Jun 10, 2020

michaelweiser closed this as completed in #161 Jun 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster duplicate handler dying from RuntimeError due to dict change while iterating it #160

Cluster duplicate handler dying from RuntimeError due to dict change while iterating it #160

michaelweiser commented Jun 10, 2020

michaelweiser commented Jun 10, 2020

michaelweiser commented Jun 10, 2020 •

edited

Loading

Cluster duplicate handler dying from RuntimeError due to dict change while iterating it #160

Cluster duplicate handler dying from RuntimeError due to dict change while iterating it #160

Comments

michaelweiser commented Jun 10, 2020

michaelweiser commented Jun 10, 2020

michaelweiser commented Jun 10, 2020 • edited Loading

michaelweiser commented Jun 10, 2020 •

edited

Loading