-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flow counter out-of-order issue by notifying counter operations using SelectableChannel #1362
Merged
kcudnik
merged 12 commits into
sonic-net:master
from
stephenxs:poc-flexcounter-new-infra
Mar 26, 2024
Merged
Fix flow counter out-of-order issue by notifying counter operations using SelectableChannel #1362
kcudnik
merged 12 commits into
sonic-net:master
from
stephenxs:poc-flexcounter-new-infra
Mar 26, 2024
+482
−12
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
stephenxs
force-pushed
the
poc-flexcounter-new-infra
branch
from
March 13, 2024 12:38
e7b3dc1
to
07aa1cd
Compare
kcudnik
requested changes
Mar 13, 2024
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
please satisfy code coverage test |
Signed-off-by: Stephen Sun <[email protected]>
kcudnik
requested changes
Mar 25, 2024
Signed-off-by: Stephen Sun <[email protected]>
kcudnik
approved these changes
Mar 25, 2024
Hi @kcudnik |
byu343
added a commit
to byu343/sonic-sairedis
that referenced
this pull request
Sep 18, 2024
The counters for syncd (switch chip) were attempted to be added to gbsyncd (gearbox phys), and vice versa. This issue is introduced by sonic-net#1362 When setting the redis attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every contexts (both syncd and gbsyncd). However, the counters to initialize could only exist in one context. The fix is to check that the target switch id exists in the context; if not, skip the operation.
byu343
added a commit
to byu343/sonic-sairedis
that referenced
this pull request
Oct 8, 2024
The counters for syncd (switch chip) were attempted to be added to gbsyncd (gearbox phys), and vice versa. This issue is introduced by sonic-net#1362 When setting the redis attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every contexts (both syncd and gbsyncd). However, the counters to initialize could only exist in one context. The fix is to check that the target switch id exists in the context; if not, skip the operation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What I did
Fix flow counter out-of-order issue by notifying counter operations using SelectableChannel
Signed-off-by: Stephen Sun [email protected]
Why I did it
Currently, the operations of SAI objects and their counters (if any) are triggered by different channels, which introduces racing conditions:
SelectableChannel
,FLEX_COUNTER
andFLEX_COUNTER_GROUP
tables in theFLEX_COUNTER_DB
syncd
can receive events in a wrong order, eg. it receives destroying an object first and then stopping counter polling on the object, it can poll counter for a non-exist object, which causes errors in vendor SAI.The new solution is to extend SAI redis attributes on the SAI_SWITCH_OBJECT to notify counter polling. As a result, all the objects and their counters are notified using a unified channel, which is the
SelectableChannel
.How I verified it
Unit test
Manual test
Regressions test
Details if related
There are two SAI Redis attributes introduced as below. There are some fields with
const char *
type for each attribute. Passing a field asnullptr
means not to change it.SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP
for counters represented byFLEX_COUNTER_GROUP
table in theFLEX_COUNTER_DB
, including the following fieldscounter_group_name
, which is the key of the table, representing the group name.poll_interval
, which is the fieldPOLL_INTERVAL
of an entry, representing the polling interval of the group.operation
, which is the fieldFLEX_COUNTER_STATUS
of an entry, representing whether the counter polling is enabled for the groupstats_mode
, which is the fieldSTATS_MODE
of an entry, eitherSTATS_MODE_READ
orSTATS_MODE_READ_AND_CLEAR
plugins
, which represents the Lua plugin related to the groupplugin_name
, which is the name of the plugins field. It differs among different groupsSAI_REDIS_SWITCH_ATTR_FLEX_COUNTER
for counter groups represented by theFLEX_COUNTER
table in theFLEX_COUNTER_DB
, including the following fieldscounter_key
, which is the key of the table, with the name convention of<group-name>:oid:<oid-value>
counter_ids
, which is a list of counter IDs to be polled for the objectcounter_field_name
, which is the name of the counter ID field. It differs among different groupsstats_mode
, which is the fieldSTATS_MODE
of an entry, eitherSTATS_MODE_READ
orSTATS_MODE_READ_AND_CLEAR
Both SAI attributes are terminated by the
RedisRemoteSaiInterface
object in the swss context, which serializes the SAI API call into the selectable channel.REDIS_FLEX_COUNTER_COMMAND_SET_COUNTER_GROUP
: represents theSET
operation in theFLEX_COUNTER_GROUP
tableREDIS_FLEX_COUNTER_COMMAND_DEL_COUNTER_GROUP
: represents theDEL
operation in theFLEX_COUNTER_GROUP
tableREDIS_FLEX_COUNTER_COMMAND_START_POLL
: represents theSET
operation in theFLEX_COUNTER
tableREDIS_FLEX_COUNTER_COMMAND_STOP_POLL
: represents theDEL
operation in theFLEX_COUNTER
tableThe Syncd will call flex counter functions to handle them on receiving the above-extended commands (representing both SAI extended attributes).
Gearbox flex counter database
Pass the Phy OID, an OID of a SAI switch object in syntax, when calling the SAI set API to set the extended attributes. By doing so, the SAI redis objects can choose in which context the SAI API call should be invoked and the corresponding gearbox syncd docker container will handle it.
(ps: THE ORIGINAL GEARBOX FLEX COUNTER IMPLEMENTATION IS BUGGY)
Context and critical section analysis
It does not change the critical section hierarchy
Performance analysis
The counter operations are handled in the same thread in both the new and old solutions.
In swss, the counter operation was asynchronous in the old solution and is synchronous now, which can introduce a bit more latency. However, as the number of counter operations is small, no performance degradation is observed.