-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in ConntrackConnectionStore and FlowExporter #3655
Conversation
e00f8c1
to
a6093c3
Compare
Codecov Report
@@ Coverage Diff @@
## main #3655 +/- ##
==========================================
- Coverage 64.65% 57.10% -7.55%
==========================================
Files 278 392 +114
Lines 39363 54823 +15460
==========================================
+ Hits 25449 31306 +5857
- Misses 11939 21060 +9121
- Partials 1975 2457 +482
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the helpful PR description.
A couple of typos there, e.g. "We fix it by hold the lock until finish 1&2." instead of "We fix it by holding the lock until we finish 1&2."
44d2030
to
fd726be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
6737581
to
c4e0568
Compare
Squash the commits and rewrite the commit message... /test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except a nit.
c4e0568
to
31b8a81
Compare
/test-all |
/test-networkpolicy |
/test-e2e |
Conntrack connection store's polling go routine and flow exporter both access to conntrack connection store, and there's a race condition error. In the polling go routine, `deleteIfStaleOrResetConn` and `AddOrUpdateConn` both grab the lock, modify `conn.IsPresent` field, and release the lock. Between the execution of these two functions, it is likely that FlowExporter's timer is triggered and it reads the wrong `conn.IsPresent` value in an intermidiate state. We fix it by holding the lock until we finish the execution of both two functions. Fixes: antrea-io#3650 Signed-off-by: heanlan <[email protected]>
31b8a81
to
323ba53
Compare
/test-all |
/test-e2e |
/test-e2e |
Conntrack connection store's polling go routine and flow exporter both access to conntrack connection store, and there's a race condition error.
In
func (cs *ConntrackConnectionStore) Poll()
, two things happen in sequence:deleteIfStaleOrResetConn
, we acquire the lock, resetconn.IsPresent = false
for all the connections in connection map, and then release the lock (conn.IsPresent
is used to describe whether the connection exist in conntrack table or not)AddOrUpdateConn
, we acquire the lock, setconn.IsPresent = true
, then release the lockIt is likely to happen, when flow exporter's timer is triggered between 1 and 2, it will grab the lock, and read a connection with
IsPresent
set to false. In the corresponding exported flow record,flowEndReason
will be set to 3, representing the flow has ended. Here's an antrea-agent log to verify the existence of this error: logWe fix it by holding the lock until we finish 1&2.
The observation comes from the error log of flowaggregator e2e test. In the record,
flowEndReason
was set to 3, so the test treated the record as the last record. Pointer to test code. It computed the throughput value by totalByteCount/iperfTimeSec, without reading from the throughput field in the record, which has the correct value.Fixes: #3650
Signed-off-by: heanlan [email protected]