-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[xcvrd] Enhance xcvrd to handle new system level event/error #39
Conversation
sonic-xcvrd/scripts/xcvrd
Outdated
for key, value in port_dict.iteritems(): | ||
if key == EVENT_ON_ALL_SFP: | ||
logger.log_error("Receive event with unexpected content, key: {}, value: {}".format(key, value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Receive [](start = 42, length = 7)
Past tense is better. As original 'Got'. Many time below. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
sonic-xcvrd/scripts/xcvrd
Outdated
for key, value in port_dict.iteritems(): | ||
if key == EVENT_ON_ALL_SFP: | ||
if value == SYSTEM_NOT_READY: | ||
# system not ready, wait and retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 32, length = 1)
wrong indentation. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
sonic-xcvrd/scripts/xcvrd
Outdated
if value == SYSTEM_NOT_READY: | ||
# system not ready, wait and retry | ||
if self._retry_for_system_ready(): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 33, length = 3)
inconsistent indentation #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
sonic-xcvrd/scripts/xcvrd
Outdated
while retry < RETRY_TIMES_FOR_SYSTEM_READY: | ||
#time.sleep(RETRY_PERIOD_FOR_SYSTEM_READY_SECS) | ||
time_start = time.time() | ||
rc, event_dict = platform_sfputil.get_transceiver_change_event(RETRY_PERIOD_FOR_SYSTEM_READY_MSECS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
platform_sfputil.get_transceiver_change_event [](start = 29, length = 45)
You can call platform_sfputil.get_transceiver_change_event function only once, and implement the logic as a normal state machine.
I don't mean your implementation is wrong, state machine may be much more readable and future proof. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Qi,
I admit that rewriting it in a state machine approach can make the code a little shorter by eliminating some duplicate code.
But I'm afraid it can also introduce negative effects: We have the logic to control retry timespan and times and introduce some local variables to control that. Rewriting it as a state machine will combine the retry control logic into the main procedure of xcvrd, which make the main function complicated. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit for state machine pattern is not shorter code, but a correct design and implementation.
For the design part, you need to consider any sequence of change events returned by vendor plugin code. For example, SYSTEM_NOT_READY followed by an inserted event. You need to draw the state machine on paper first to consider all the events at each state. You need to document the state machine diagram first to consider all the events at each state.
For the implementation part, you could design you state transition code in standalone function, and achieve 'control retry timespan and times' or 'local variables'.
In reply to: 310075723 [](ancestors = 310075723)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Qi,
I understand the advantage of state machine approach. I just feel that the logic of xcvrd daemon is relatively simple and it seems that using state machine is less necessary.
IMO, state machine is adopted in event handling systems which have a lot of state and event, like network protocols. In those systems using state machine makes the flow clear and relatively easy to understand and also benefits the formal verification of that kind of system.
But for systems with less states and events the state machine approach doesn't have overwhelming advantage, since the current logic is straightforward. I have considered the state machine approach and found that only 2 states and 4 events. In this sense it's less necessary to go in this way.
meanwhile, I've created a PR in which I've updated the handling logic in a state-machine-like way, I think it is better than the current implementation. but it's not a strictly-speaking state machine. the pr is sonic-platform-daemons PR#5. You can check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As comments
Also blocked on sonic-net/sonic-platform-common#50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed again. Pending on #39 (comment)
…e state to "SYSTEM_READY"
Retry on fail state machine approach
code updated with state machine approach
events definition
State transmit:
State event next state |
sonic-xcvrd/scripts/xcvrd
Outdated
status, port_dict = platform_sfputil.get_transceiver_change_event(timeout) | ||
logger.log_debug("Got event {} {} in state {}".format(status, port_dict, state)) | ||
event = self._mapping_event_from_change_event(status, port_dict) | ||
for key, value in port_dict.iteritems(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for key, value in port_dict.iteritems(): [](start = 12, length = 40)
Why you put the for-loop outside the event/state branches? Can you move it inside a smaller branch? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You are correct.
Only for NORMAL_EVENT is this for loop required.
The "break" statements which originally break the inner "for" loop are also removed.
sonic-xcvrd/scripts/xcvrd
Outdated
SYSTEM_FAIL = 'system_fail' | ||
NORMAL_EVENT = 'normal' | ||
# states definition | ||
STATE_INIT = 'INIT STATE' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'INIT STATE' [](start = 13, length = 12)
You can use integer as value, don't rely on python to optimize the string comparison. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to use string is better since it makes logging information more human-readable.
Even though a dictionary can be adopted to map from integer value to string, the case that the state is not in the key set of the dictionary has to be handled. That's too complicated.
BTW, I'm not sure whether to use string address to compare the string is part of python's language specification or just optimization that depends on python interpreter's implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
states have been updated to integer value.
The events, like SYSTEM_FAIL, are already used by the plugins. In this sence, we'd better not to change them. It seems that to use string has already been a common practice in plugins, like the normal plugin/out events which are represented by '1', '0' respectively.
sonic-xcvrd/scripts/xcvrd
Outdated
# If get_transceiver_change_event() return error, will clean up the DB and then exit | ||
# TODO: next step need to define more error types to handle accordingly. | ||
logger.log_error("Failed to get transceiver change event. Exiting...") | ||
next_state = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_state [](start = 12, length = 10)
You may initialize it with state, next_state will never be None. #Closed
sonic-xcvrd/scripts/xcvrd
Outdated
|
||
if unexpected_state: | ||
next_state = STATE_EXIT | ||
logger.log_error("Facing unexpected state {}, exiting...".format(state)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove unexpected_state. This is redundant logic. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for log.
By setting unexpected_state to true we can know why the state machine reach STATE_EXIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for log.
By setting unexpected_state to true we can know why the state machine reach STATE_EXIT.
I decide to remove this. It is very unlikely to enter an unexpected state since all states are handled for each event.
sonic-xcvrd/scripts/xcvrd
Outdated
else: | ||
unexpected_state = True | ||
next_state = STATE_EXIT | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeated code, you can extract function. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now, only next_state is set when facing unexpected state. in this sense, no need to extract function.
os.kill(os.getppid(), signal.SIGTERM) | ||
break | ||
elif next_state == STATE_NORMAL: | ||
timeout = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move this part into event handling branches. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to set it here since it can ensure that wherever the state reaches STATE_NORMAL the timeout can be properly set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my new comment, and also please resolve merge conflict for this PR.
sonic-xcvrd/scripts/xcvrd
Outdated
logger.log_error("Failed to get transceiver change event. Exiting...") | ||
logger.log_warning("Got unknown event {} on state {}.".format(event, state)) | ||
|
||
if not next_state == state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not next_state == state: [](start = 12, length = 27)
For readability, use if next_state != state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
[sonic-platform-common] [sonic_sfp] Interpret sff 'int' element =0 as valid value (sonic-net/sonic-platform-common#51) add more error code to get_transceiver_change_event ((sonic-net/sonic-platform-common#50) [sonic_platform_base] support new-platform-api-based daemons ((sonic-net/sonic-platform-common#48) sync change to sonic_platform_base/sonic_sfp and create symbol link ((sonic-net/sonic-platform-common#49) Add parser support for Tx_RxLos,TxFault, PowerControl, ResetStatus in sff8436.py ((sonic-net/sonic-platform-common#45) readd type_abbrv_name in sonic_sfp/sff8436.py ((sonic-net/sonic-platform-common#44) [psu_base] get_status_led() returns current state of the status LED ((sonic-net/sonic-platform-common#39) Fix abbrv name for OSFP ((sonic-net/sonic-platform-common#36) [sff8436] support "Control Bytes" and "Options" ((sonic-net/sonic-platform-common#38) sonic_sfp: avoid possible key error in get_physical_to_logical() ((sonic-net/sonic-platform-common#37) [sonic-platform-daemons] [xcvrd] Enhance xcvrd to handle new system level event/error (sonic-net/sonic-platform-daemons#39) [xcvrd] Support both new platform API and old platform plugins (sonic-net/sonic-platform-daemons#38) [psud] Support both new platform API and old platform plugins (sonic-net/sonic-platform-daemons#37) [syseepromd] Support both new platform API and old platform plugins (sonic-net/sonic-platform-daemons#36) Add missing import statemet (sonic-net/sonic-platform-daemons#32) sonic_xcvrd: Support for DOM Threshold values for EEPROM dump (sonic-net/sonic-platform-daemons#29)
[xcvrd] backport PR(#39) "Enhance xcvrd to handle new system level event/error" to 201811
…ic-net#3333) [sonic-platform-common] [sonic_sfp] Interpret sff 'int' element =0 as valid value (sonic-net/sonic-platform-common#51) add more error code to get_transceiver_change_event ((sonic-net/sonic-platform-common#50) [sonic_platform_base] support new-platform-api-based daemons ((sonic-net/sonic-platform-common#48) sync change to sonic_platform_base/sonic_sfp and create symbol link ((sonic-net/sonic-platform-common#49) Add parser support for Tx_RxLos,TxFault, PowerControl, ResetStatus in sff8436.py ((sonic-net/sonic-platform-common#45) readd type_abbrv_name in sonic_sfp/sff8436.py ((sonic-net/sonic-platform-common#44) [psu_base] get_status_led() returns current state of the status LED ((sonic-net/sonic-platform-common#39) Fix abbrv name for OSFP ((sonic-net/sonic-platform-common#36) [sff8436] support "Control Bytes" and "Options" ((sonic-net/sonic-platform-common#38) sonic_sfp: avoid possible key error in get_physical_to_logical() ((sonic-net/sonic-platform-common#37) [sonic-platform-daemons] [xcvrd] Enhance xcvrd to handle new system level event/error (sonic-net/sonic-platform-daemons#39) [xcvrd] Support both new platform API and old platform plugins (sonic-net/sonic-platform-daemons#38) [psud] Support both new platform API and old platform plugins (sonic-net/sonic-platform-daemons#37) [syseepromd] Support both new platform API and old platform plugins (sonic-net/sonic-platform-daemons#36) Add missing import statemet (sonic-net/sonic-platform-daemons#32) sonic_xcvrd: Support for DOM Threshold values for EEPROM dump (sonic-net/sonic-platform-daemons#29)
This PR has a dependency on PRs:
sonic-net/sonic-platform-common#50
sonic-net/sonic-buildimage#3261
In the above PRs introduced the new system-level event, this PR to enhance xcvrd to be more robust by handling the more new system-level event:
wait for a certain period on system_not_ready event to overcome race condition when xcvrd started but system not ready.
start to routing work on system_become_ready event, this event indicates that the system is ready for SFP change event report.
exit on system_fail event