-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving thread safety of K8s informer DB #1118
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1118 +/- ##
=======================================
Coverage 81.99% 81.99%
=======================================
Files 140 140
Lines 11574 11570 -4
=======================================
- Hits 9490 9487 -3
+ Misses 1560 1558 -2
- Partials 524 525 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice catch! I added few remarks that I think should make things even safer.
} | ||
} | ||
} | ||
|
||
func (id *Database) addProcess(ifp *container.Info) { | ||
id.deletePodCache(ifp.PIDNamespace) | ||
id.nsMut.Lock() | ||
id.access.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe safer to do a defer unlock() here?
@@ -128,36 +119,28 @@ func (id *Database) AddProcess(pid uint32) { | |||
} | |||
|
|||
func (id *Database) CleanProcessCaches(ns uint32) { | |||
id.access.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defer here too?
pkg/internal/transform/kube/db.go
Outdated
for _, ip := range pod.IPInfo.IPs { | ||
id.podsByIP[ip] = pod | ||
} | ||
id.access.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can lead to a race too, we check the pod state before we lock. Let's wrap the whole function in a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, it's not needed because the Mutex protects access to the maps, and the pod
value is provided from outside the database, from a value that is not concurrently modified.
Anyway let's move the lock outside the block to make it clearer, as anyway the pods should have always IPs.
pkg/internal/transform/kube/db.go
Outdated
defer id.podsMut.Unlock() | ||
for _, ip := range pod.IPInfo.IPs { | ||
delete(id.podsByIP, ip) | ||
id.access.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we should check the pod state inside the lock.
@@ -203,30 +209,28 @@ func (id *Database) UpdateNewServicesByIPIndex(svc *kube.ServiceInfo) { | |||
|
|||
func (id *Database) UpdateDeletedServicesByIPIndex(svc *kube.ServiceInfo) { | |||
if len(svc.IPInfo.IPs) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the pod state inside a lock.
pkg/internal/transform/kube/db.go
Outdated
return id.podsByIP[ip] | ||
} | ||
|
||
func (id *Database) UpdateNewServicesByIPIndex(svc *kube.ServiceInfo) { | ||
if len(svc.IPInfo.IPs) > 0 { | ||
id.svcMut.Lock() | ||
defer id.svcMut.Unlock() | ||
id.access.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also check the pod state inside the lock.
Should minimize the impact of #1117
The kubernetes database was micro-optimized (the "root of all evil") by separating all the data access by different mutexes. That could avoid Go complaining about race conditions while avoiding to block the whole database each time a single element is accessed (e.g. you could access services and pods info simultaneously).
However that lead to effective race conditions when, for example, the Pod information was updated without locking the access to its containers and PID namespaces.
The frequency of writing in the kube database is not really high, so controlling the access through a single mutex should not have a noticeable impact in performance.