Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947) #2349

Merged
merged 6 commits into from
Jan 9, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions sfputil/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -1112,6 +1112,51 @@ def run_firmware(port_name, mode):

return status

def is_fw_switch_done(port_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this API is required because there is already a return status from run_firmware() to indicate a failure. An imageA could be running while the committed Image could be Image B. For eg.

  1. Image A is running and its is the committed image
  2. User updates Image B
  3. Runs the image B

Now the running image is B but the committed image is A

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"run_firmware" indicates the status which read before reset to target image. This status only means module received and done the judgement for "run_firmware" cmd, but not means module already jumped into target image.
"run_firmware" will maybe involved non-volatile memory operation in module according to module implementation, so that it will maybe spend several or tens of seconds. That will cause commit failed if sends "commit_firmware" once "run_firmware" done without "is_fw_switch_done".
imageCheckPurpose

"""
Make sure the run_firmware cmd is done
@port_name:
Returns 1 on success, and exit_code = -1 on failure
"""
status = 0
physical_port = logical_port_to_physical_port_index(port_name)
sfp = platform_chassis.get_sfp(physical_port)

try:
api = sfp.get_xcvr_api()
except NotImplementedError:
click.echo("This functionality is currently not implemented for this platform")
sys.exit(ERROR_NOT_IMPLEMENTED)

try:
MAX_WAIT = 600 # 60s timeout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we read from the module what is advertised instead of hardcoding here?

Copy link
Contributor Author

@CliveNi CliveNi Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still no definition of max duration for CDB_run cmd until CMIS5.2.
CDB_Duration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CliveNi why 60 secs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more compatible with different module, timeout time should be the worst case. 60s is calculated for different implementation.
Feasible implementation as below,

  1. Dual image(A and B) is supported, and fw only executable at specific image.
  2. Dual image(A and B) is supported, and fw executable at both image.
  3. Triple image(A, B and factory) is supported, and fw only executable at specific image.
  4. Triple image(A, B and factory) is supported, and fw executable at all image.
  5. Single image is supported.

Case 1 and 3 will spend the most time to switch image. It will need to copy one image to non-volatile buffer, then interchange two images with each other. That will need 3 cycles erasing and programming on non-volatile memory.

We can roughly assume all of fw component in module around 1MBytes. A common M0~M4 MCU needs around 40ms per page(2KBytes) for non-volatile erasing and programming, so that the calculation as below.
1MBytes / 2KBytes * 40ms * 3cycles = 61.44s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, if a firmware size is >> 1MB (e.g. close to 2 MB), does that mean 60 sec wait time would not be enough?

Copy link
Contributor Author

@CliveNi CliveNi Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, 1MB is a roughly number, but I think the margin is enough for different DSP solutions now. Actually, it only uses around 500KB totally in our 400/800G implementation.

fw_info = api.get_module_fw_info()
cnt = 0
is_busy = 1 if (fw_info['status'] == False) and (fw_info['result'] is not None) else 0
Copy link
Contributor

@qinchuanares qinchuanares Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show examples of fw_info 1) when a module is busy and 2) when a module shows faulty status?

Copy link
Contributor Author

@CliveNi CliveNi Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fw_info" is returned by "get_module_fw_info()" which is in cmis.py of "sonic-platform-common". It will return 3 combination patterns of status and result as below.

  1. {['status']: False, ['result']: None} : Cdb features not supported by switch, or wrong chkcode got from module. Do NOT need to recheck in these cases.
  2. {['status']: False, ['result']: 0} : I2C NACK or other interface issue. Recheck module fw info until timeout in this case.
  3. {['status']: True, ['result']: (ImageA & B information)} : Valid information got from module, go ahead to handle the return value.

The unit test function "test_is_fw_switch_done" in sfputil_test.py will also test all these cases.

while is_busy and cnt < MAX_WAIT:
time.sleep(0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May not need to check this frequently. It also brings a lot of workload to the module. Checking every 1 or 2 seconds should be OK without causing too much time delay.

Copy link
Contributor Author

@CliveNi CliveNi Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, already revised to check every 2 seconds.

fw_info = api.get_module_fw_info()
is_busy = 1 if (fw_info['status'] == False) and (fw_info['result'] is not None) else 0
cnt += 1

if fw_info['status'] == True:
(ImageA, ImageARunning, ImageACommitted, ImageAValid,
ImageB, ImageBRunning, ImageBCommitted, ImageBValid) = fw_info['result']

if (ImageARunning == 1) and (ImageACommitted == 0): # ImageA is running, but not committed.
status = 1 # run_firmware is done.
elif (ImageBRunning == 1) and (ImageBCommitted == 0): # ImageB is running, but not committed.
status = 1 # run_firmware is done.
else: # No image is running, or running and committed image is same.
status = -1 # Failure for Switching images.
else:
status = -1 # Timeout or check code error or CDB not supported.

except NotImplementedError:
click.echo("This functionality is not applicable for this transceiver")

return status

def commit_firmware(port_name):
status = 0
physical_port = logical_port_to_physical_port_index(port_name)
Expand Down Expand Up @@ -1280,6 +1325,13 @@ def upgrade(port_name, filepath):

click.echo("Firmware run in mode 1 successful")

status = is_fw_switch_done(port_name)
if status != 1:
click.echo('Failed to switch firmware images!')
sys.exit(EXIT_FAIL)

click.echo("Firmware images switch successful")

status = commit_firmware(port_name)
if status != 1:
click.echo('Failed to commit firmware! CDB status: {}'.format(status))
Expand Down