-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix - Race condition when trying to add a log file that's been scheduled for removal #1302
Conversation
ea66872
to
59f09e8
Compare
# We don't want to be creating an empty default dict by get. | ||
if path not in self.__paths: | ||
return {}.items() | ||
|
||
return self.__paths[path].items() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you but I think this is ok: return self.__paths.get(path, {}).items()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely!
@@ -435,7 +445,13 @@ def keys(self): | |||
|
|||
def pop(self, path, worker_id, default=None): | |||
# type: (six.text_type, six.text_type, Optional[object]) -> Optional[object] | |||
return self.__paths[path].pop(worker_id, default) | |||
|
|||
value = self.__paths[path].pop(worker_id, default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to be careful that path exists in self.__paths?
Maybe better to be value = self.__paths.get(path, {}).pop(worker_id, default)
or something similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.__path is a defaultdict. But this way it's not consistent and a bit confusing. I will ditch the defaultdict, use this way and modify a set method as well so it's obvious what's expected to happen.
|
||
value = self.__paths[path].pop(worker_id, default) | ||
|
||
if not self.__paths[path]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly this could be if path in self.__paths and not self.__paths[path]:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will work because of defaultdict, but it's confusing, I will change it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One consideration for future work, to make the use of the lock clearer could use with self.__lock:
to avoid having to explicitly release it
Based on the comments the lock was initially meant to lock status fields used in generate_status.
Now it ensures synchronized access to fields needed for asynchonous workflow of log files.
This is already causing a timeout when retrieving status from the agent (https://sentinelone.atlassian.net/browse/DTIN-4769) and will be handled in another PR.
The Bug:
When a log file (path, worker_id) is scheduled for removal, the removal is done in two separate steps:
The race condition can be reached when a path is scheduled for removal and the matchers are still finishing.
When add_log_config is called it considers log path to be active and skips adding it. The log file is later removed.
In log we can see sequence like this one: