Skip to content

Commit

Permalink
Cleanup flow of error handling and retries for OTA updates (#17072)
Browse files Browse the repository at this point in the history
* - Added CHIP_ERROR_MAX_RETRY_EXCEEDED. Changed UpdateNotFound() to return CHIP_ERROR value.
- Reversed order of state change call and action calls.

* - Remove un-used GenericOTARequestorDriver::HandleError()

* - Added GenericOTARequestorDriver::ScheduleRetry()

* - Move DefaultOTARequestorDriver::ScheduleRetry() from public to protected API
- Re-instate DefaultOTARequestorDriver::HandleError() to handle re-tries on errors.

* scripts/helpers/restyle-diff.sh

* - Changed mServer NULL ptr check to VerifyOrDie() check
- If UpdateFailureState::kQuerying error is CHIP_ERROR_BUFFER_TOO_SMALL, set to idle.  Else retry.

* - Remove un-used mUpdateNotFoundReason and mRetryDelaySec from DefaultOTARequestorDriver

* - In DefaultOTARequestorDriver::SendQueryImage(), change state to currentUpdateState

* - Renamed ScheduleRetry to ScheduleQueryRetry()
- Updated CHIPError.cpp for CHIP_ERROR_MAX_RETRY_EXCEEDED
- In DefaultOTARequestorDriver::ScheduleQueryRetry(), moved VerifyOrDie NULL check on mRequestor into the block in which it is used.
- Added function header comment for ScheduleQueryRetry()

* - In DefaultOTARequestor::OnQueryImageResponse(), moved definition of status into OTAQueryStatus::kBusy and OTAQueryStatus::kNotAvailable blocks.
- Removed un-used UpdateFailureState::kAwaitingNextAction enum
- In DefaultOTARequestor::OnQueryImageResponse(), for case OTAQueryStatus::kNotAvailable, transition to idle regardless of return value of UpdateNotFound() so that we can perform the next Query

* - Added CHIP_ERROR_MAX_RETRY_EXCEEDED to TestCHIPErrorStr.cpp
- Removed UpdateFailureState::kIdle case from DefaultOTARequestorDriver::HandleError()
- Removed ‘Ignore error and transition to Idle’ comments from DefaultOTARequestorDriver::HandleError()
- In DefaultOTARequestorDriver::ScheduleQueryRetry(), moved definition of providerLocation into the block where it’s used

* - In ExtendedOTARequestorDriver::UpdateAvailable(), added HandleError() call back.
- In enum class UpdateFailureState, removed kIdle
- In DefaultOTARequestorDriver::ScheduleQueryRetry(), removed status and willTryAnotherQuery variables.

* - In DefaultOTARequestorDriver::UpdateNotFound()< reduced code by calling ScheduleQueryRetry()

* - Added CHIP_ERROR_PROVIDER_LIST_EXHAUSTED error case to be used in DefaultOTARequestorDriver::ScheduleQueryRetry()

* - Remove DefaultOTARequestorDriver::HandleError()

* Remove un-used UpdateFailureState enum class

* In DefaultOTARequestorDriver::ScheduleQueryRetry(), add status variable to store return status and return at the end of the function to resolve CI compilation error:

../src/app/clusters/ota-requestor/DefaultOTARequestorDriver.cpp:449:9: error: do not use 'else' after 'return' [readability-else-after-return,-warnings-as-errors]

* In DefaultOTARequestorDriver::ScheduleQueryRetry(), add CHIP_NO_ERROR check before scheduling retry to prevent scheduling retry when provider list has been exhasuted or when max retries has been exhausted.

* Update function comments for RecordErrorUpdateState()
  • Loading branch information
isiu-apple authored and pull[bot] committed Jul 22, 2023
1 parent 483c282 commit 293c89a
Show file tree
Hide file tree
Showing 9 changed files with 144 additions and 136 deletions.
95 changes: 45 additions & 50 deletions src/app/clusters/ota-requestor/DefaultOTARequestor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ void DefaultOTARequestor::OnQueryImageResponse(void * context, const QueryImageR
if (err != CHIP_NO_ERROR)
{
ChipLogError(SoftwareUpdate, "QueryImageResponse contains invalid fields: %" CHIP_ERROR_FORMAT, err.Format());
requestorCore->RecordErrorUpdateState(UpdateFailureState::kQuerying, err);
requestorCore->RecordErrorUpdateState(err);
return;
}

Expand All @@ -162,7 +162,7 @@ void DefaultOTARequestor::OnQueryImageResponse(void * context, const QueryImageR
if (!requestorCore->mProviderLocation.HasValue())
{
ChipLogError(SoftwareUpdate, "No provider location set");
requestorCore->RecordErrorUpdateState(UpdateFailureState::kQuerying, CHIP_ERROR_INCORRECT_STATE);
requestorCore->RecordErrorUpdateState(CHIP_ERROR_INCORRECT_STATE);
return;
}

Expand All @@ -174,7 +174,7 @@ void DefaultOTARequestor::OnQueryImageResponse(void * context, const QueryImageR
"The ImageURI provider node 0x" ChipLogFormatX64
" does not match the QueryImageResponse provider node 0x" ChipLogFormatX64,
ChipLogValueX64(update.nodeId), ChipLogValueX64(requestorCore->mProviderLocation.Value().providerNodeID));
requestorCore->RecordErrorUpdateState(UpdateFailureState::kQuerying, CHIP_ERROR_WRONG_NODE_ID);
requestorCore->RecordErrorUpdateState(CHIP_ERROR_WRONG_NODE_ID);
return;
}

Expand All @@ -199,7 +199,7 @@ void DefaultOTARequestor::OnQueryImageResponse(void * context, const QueryImageR
if (update.fileDesignator.size() > fileDesignator.size())
{
ChipLogError(SoftwareUpdate, "File designator size %zu is too large to store", update.fileDesignator.size());
requestorCore->RecordErrorUpdateState(UpdateFailureState::kQuerying, err);
requestorCore->RecordErrorUpdateState(CHIP_ERROR_BUFFER_TOO_SMALL);
return;
}
memcpy(fileDesignator.data(), update.fileDesignator.data(), update.fileDesignator.size());
Expand All @@ -214,25 +214,35 @@ void DefaultOTARequestor::OnQueryImageResponse(void * context, const QueryImageR
ChipLogDetail(SoftwareUpdate, "Available update version %" PRIu32 " is <= current version %" PRIu32 ", update ignored",
update.softwareVersion, requestorCore->mCurrentVersion);

requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kIdle, OTAChangeReasonEnum::kSuccess);
requestorCore->mOtaRequestorDriver->UpdateNotFound(UpdateNotFoundReason::kUpToDate,
System::Clock::Seconds32(response.delayedActionTime.ValueOr(0)));
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kIdle, OTAChangeReasonEnum::kSuccess);
}

break;
}
case OTAQueryStatus::kBusy:
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kDelayedOnQuery, OTAChangeReasonEnum::kDelayByProvider);
requestorCore->mOtaRequestorDriver->UpdateNotFound(UpdateNotFoundReason::kBusy,
System::Clock::Seconds32(response.delayedActionTime.ValueOr(0)));
case OTAQueryStatus::kBusy: {
CHIP_ERROR status = requestorCore->mOtaRequestorDriver->UpdateNotFound(
UpdateNotFoundReason::kBusy, System::Clock::Seconds32(response.delayedActionTime.ValueOr(0)));
if ((status == CHIP_ERROR_MAX_RETRY_EXCEEDED) || (status == CHIP_ERROR_PROVIDER_LIST_EXHAUSTED))
{
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kIdle, OTAChangeReasonEnum::kSuccess);
}
else
{
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kDelayedOnQuery, OTAChangeReasonEnum::kDelayByProvider);
}

break;
case OTAQueryStatus::kNotAvailable:
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kIdle, OTAChangeReasonEnum::kSuccess);
}
case OTAQueryStatus::kNotAvailable: {
requestorCore->mOtaRequestorDriver->UpdateNotFound(UpdateNotFoundReason::kNotAvailable,
System::Clock::Seconds32(response.delayedActionTime.ValueOr(0)));
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kIdle, OTAChangeReasonEnum::kSuccess);
break;
}
default:
requestorCore->RecordErrorUpdateState(UpdateFailureState::kQuerying, CHIP_ERROR_BAD_REQUEST);
requestorCore->RecordErrorUpdateState(CHIP_ERROR_BAD_REQUEST);
break;
}
}
Expand All @@ -252,7 +262,7 @@ void DefaultOTARequestor::OnQueryImageFailure(void * context, CHIP_ERROR error)
error = CHIP_ERROR_CONNECTION_CLOSED_UNEXPECTEDLY;
}

requestorCore->RecordErrorUpdateState(UpdateFailureState::kQuerying, error);
requestorCore->RecordErrorUpdateState(error);
}

void DefaultOTARequestor::OnApplyUpdateResponse(void * context, const ApplyUpdateResponse::DecodableType & response)
Expand All @@ -268,12 +278,12 @@ void DefaultOTARequestor::OnApplyUpdateResponse(void * context, const ApplyUpdat
requestorCore->mOtaRequestorDriver->UpdateConfirmed(System::Clock::Seconds32(response.delayedActionTime));
break;
case OTAApplyUpdateAction::kAwaitNextAction:
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kDelayedOnApply, OTAChangeReasonEnum::kDelayByProvider);
requestorCore->mOtaRequestorDriver->UpdateSuspended(System::Clock::Seconds32(response.delayedActionTime));
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kDelayedOnApply, OTAChangeReasonEnum::kDelayByProvider);
break;
case OTAApplyUpdateAction::kDiscontinue:
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kIdle, OTAChangeReasonEnum::kSuccess);
requestorCore->mOtaRequestorDriver->UpdateDiscontinued();
requestorCore->RecordNewUpdateState(OTAUpdateStateEnum::kIdle, OTAChangeReasonEnum::kSuccess);
break;
}
}
Expand All @@ -284,7 +294,7 @@ void DefaultOTARequestor::OnApplyUpdateFailure(void * context, CHIP_ERROR error)
VerifyOrDie(requestorCore != nullptr);

ChipLogDetail(SoftwareUpdate, "ApplyUpdate failure response %" CHIP_ERROR_FORMAT, error.Format());
requestorCore->RecordErrorUpdateState(UpdateFailureState::kApplying, error);
requestorCore->RecordErrorUpdateState(error);
}

void DefaultOTARequestor::OnNotifyUpdateAppliedResponse(void * context, const app::DataModel::NullObjectType & response) {}
Expand All @@ -295,7 +305,7 @@ void DefaultOTARequestor::OnNotifyUpdateAppliedFailure(void * context, CHIP_ERRO
VerifyOrDie(requestorCore != nullptr);

ChipLogDetail(SoftwareUpdate, "NotifyUpdateApplied failure response %" CHIP_ERROR_FORMAT, error.Format());
requestorCore->RecordErrorUpdateState(UpdateFailureState::kNotifying, error);
requestorCore->RecordErrorUpdateState(error);
}

void DefaultOTARequestor::Reset()
Expand Down Expand Up @@ -339,17 +349,12 @@ void DefaultOTARequestor::HandleAnnounceOTAProvider(app::CommandHandler * comman

void DefaultOTARequestor::ConnectToProvider(OnConnectedAction onConnectedAction)
{
if (mServer == nullptr)
{
ChipLogError(SoftwareUpdate, "Server not set");
RecordErrorUpdateState(UpdateFailureState::kUnknown, CHIP_ERROR_INCORRECT_STATE);
return;
}
VerifyOrDie(mServer != nullptr);

if (!mProviderLocation.HasValue())
{
ChipLogError(SoftwareUpdate, "Provider location not set");
RecordErrorUpdateState(UpdateFailureState::kUnknown, CHIP_ERROR_INCORRECT_STATE);
RecordErrorUpdateState(CHIP_ERROR_INCORRECT_STATE);
return;
}

Expand All @@ -358,7 +363,7 @@ void DefaultOTARequestor::ConnectToProvider(OnConnectedAction onConnectedAction)
if (fabricInfo == nullptr)
{
ChipLogError(SoftwareUpdate, "Cannot find fabric");
RecordErrorUpdateState(UpdateFailureState::kUnknown, CHIP_ERROR_INCORRECT_STATE);
RecordErrorUpdateState(CHIP_ERROR_INCORRECT_STATE);
return;
}

Expand All @@ -373,32 +378,27 @@ void DefaultOTARequestor::ConnectToProvider(OnConnectedAction onConnectedAction)
if (err != CHIP_NO_ERROR)
{
ChipLogError(SoftwareUpdate, "Cannot establish connection to provider: %" CHIP_ERROR_FORMAT, err.Format());
RecordErrorUpdateState(UpdateFailureState::kUnknown, CHIP_ERROR_INCORRECT_STATE);
RecordErrorUpdateState(CHIP_ERROR_INCORRECT_STATE);
return;
}
}

void DefaultOTARequestor::DisconnectFromProvider()
{
if (mServer == nullptr)
{
ChipLogError(SoftwareUpdate, "Server not set");
RecordErrorUpdateState(UpdateFailureState::kUnknown, CHIP_ERROR_INCORRECT_STATE);
return;
}
VerifyOrDie(mServer != nullptr);

if (!mProviderLocation.HasValue())
{
ChipLogError(SoftwareUpdate, "Provider location not set");
RecordErrorUpdateState(UpdateFailureState::kUnknown, CHIP_ERROR_INCORRECT_STATE);
RecordErrorUpdateState(CHIP_ERROR_INCORRECT_STATE);
return;
}

FabricInfo * fabricInfo = mServer->GetFabricTable().FindFabricWithIndex(mProviderLocation.Value().fabricIndex);
if (fabricInfo == nullptr)
{
ChipLogError(SoftwareUpdate, "Cannot find fabric");
RecordErrorUpdateState(UpdateFailureState::kUnknown, CHIP_ERROR_INCORRECT_STATE);
RecordErrorUpdateState(CHIP_ERROR_INCORRECT_STATE);
return;
}

Expand Down Expand Up @@ -444,7 +444,7 @@ void DefaultOTARequestor::OnConnected(void * context, OperationalDeviceProxy * d
if (err != CHIP_NO_ERROR)
{
ChipLogError(SoftwareUpdate, "Failed to send QueryImage command: %" CHIP_ERROR_FORMAT, err.Format());
requestorCore->RecordErrorUpdateState(UpdateFailureState::kQuerying, err);
requestorCore->RecordErrorUpdateState(err);
return;
}
break;
Expand All @@ -455,7 +455,7 @@ void DefaultOTARequestor::OnConnected(void * context, OperationalDeviceProxy * d
if (err != CHIP_NO_ERROR)
{
ChipLogError(SoftwareUpdate, "Failed to start download: %" CHIP_ERROR_FORMAT, err.Format());
requestorCore->RecordErrorUpdateState(UpdateFailureState::kDownloading, err);
requestorCore->RecordErrorUpdateState(err);
return;
}
break;
Expand All @@ -466,7 +466,7 @@ void DefaultOTARequestor::OnConnected(void * context, OperationalDeviceProxy * d
if (err != CHIP_NO_ERROR)
{
ChipLogError(SoftwareUpdate, "Failed to send ApplyUpdate command: %" CHIP_ERROR_FORMAT, err.Format());
requestorCore->RecordErrorUpdateState(UpdateFailureState::kApplying, err);
requestorCore->RecordErrorUpdateState(err);
return;
}
break;
Expand All @@ -477,7 +477,7 @@ void DefaultOTARequestor::OnConnected(void * context, OperationalDeviceProxy * d
if (err != CHIP_NO_ERROR)
{
ChipLogError(SoftwareUpdate, "Failed to send NotifyUpdateApplied command: %" CHIP_ERROR_FORMAT, err.Format());
requestorCore->RecordErrorUpdateState(UpdateFailureState::kNotifying, err);
requestorCore->RecordErrorUpdateState(err);
return;
}
break;
Expand All @@ -499,13 +499,13 @@ void DefaultOTARequestor::OnConnectionFailure(void * context, PeerId peerId, CHI
switch (requestorCore->mOnConnectedAction)
{
case kQueryImage:
requestorCore->RecordErrorUpdateState(UpdateFailureState::kQuerying, error);
requestorCore->RecordErrorUpdateState(error);
break;
case kDownload:
requestorCore->RecordErrorUpdateState(UpdateFailureState::kDownloading, error);
requestorCore->RecordErrorUpdateState(error);
break;
case kApplyUpdate:
requestorCore->RecordErrorUpdateState(UpdateFailureState::kApplying, error);
requestorCore->RecordErrorUpdateState(error);
break;
default:
break;
Expand Down Expand Up @@ -567,7 +567,7 @@ void DefaultOTARequestor::NotifyUpdateApplied()
if (DeviceLayer::ConfigurationMgr().GetProductId(productId) != CHIP_NO_ERROR)
{
ChipLogError(SoftwareUpdate, "Cannot get Product ID");
RecordErrorUpdateState(UpdateFailureState::kUnknown, CHIP_ERROR_INCORRECT_STATE);
RecordErrorUpdateState(CHIP_ERROR_INCORRECT_STATE);
return;
}

Expand Down Expand Up @@ -618,8 +618,7 @@ void DefaultOTARequestor::OnDownloadStateChanged(OTADownloader::State state, OTA
case OTADownloader::State::kIdle:
if (reason != OTAChangeReasonEnum::kSuccess)
{
// TODO: Should we call some driver API to give it a chance to reschedule?
RecordErrorUpdateState(UpdateFailureState::kDownloading, CHIP_ERROR_CONNECTION_ABORTED, reason);
RecordErrorUpdateState(CHIP_ERROR_CONNECTION_ABORTED, reason);
}

break;
Expand Down Expand Up @@ -669,13 +668,12 @@ void DefaultOTARequestor::RecordNewUpdateState(OTAUpdateStateEnum newState, OTAC
}
OtaRequestorServerOnStateTransition(mCurrentUpdateState, newState, reason, targetSoftwareVersion);

// Issue#16151 tracks re-factoring error and state transitioning handling.
if ((newState == OTAUpdateStateEnum::kIdle) && (mCurrentUpdateState != OTAUpdateStateEnum::kIdle))
{
IdleStateReason idleStateReason = MapErrorToIdleStateReason(error);

// Inform the driver that the core logic has entered the Idle state
mOtaRequestorDriver->HandleIdleState(idleStateReason);
mOtaRequestorDriver->HandleIdleStateEnter(idleStateReason);
}
else if ((mCurrentUpdateState == OTAUpdateStateEnum::kIdle) && (newState != OTAUpdateStateEnum::kIdle))
{
Expand All @@ -685,11 +683,8 @@ void DefaultOTARequestor::RecordNewUpdateState(OTAUpdateStateEnum newState, OTAC
mCurrentUpdateState = newState;
}

void DefaultOTARequestor::RecordErrorUpdateState(UpdateFailureState failureState, CHIP_ERROR error, OTAChangeReasonEnum reason)
void DefaultOTARequestor::RecordErrorUpdateState(CHIP_ERROR error, OTAChangeReasonEnum reason)
{
// Inform driver of the error
mOtaRequestorDriver->HandleError(failureState, error);

// Log the DownloadError event
OTAImageProcessorInterface * imageProcessor = mBdxDownloader->GetImageProcessorDelegate();
VerifyOrDie(imageProcessor != nullptr);
Expand Down
5 changes: 2 additions & 3 deletions src/app/clusters/ota-requestor/DefaultOTARequestor.h
Original file line number Diff line number Diff line change
Expand Up @@ -196,10 +196,9 @@ class DefaultOTARequestor : public OTARequestorInterface, public BDXDownloader::
void RecordNewUpdateState(OTAUpdateStateEnum newState, OTAChangeReasonEnum reason, CHIP_ERROR error = CHIP_NO_ERROR);

/**
* Record the error update state by informing the driver of the error and calling `RecordNewUpdateState`
* Record the error update state and transition to the idle state
*/
void RecordErrorUpdateState(UpdateFailureState failureState, CHIP_ERROR error,
OTAChangeReasonEnum reason = OTAChangeReasonEnum::kFailure);
void RecordErrorUpdateState(CHIP_ERROR error, OTAChangeReasonEnum reason = OTAChangeReasonEnum::kFailure);

/**
* Generate an update token using the operational node ID in case of token lost, received in QueryImageResponse
Expand Down
Loading

0 comments on commit 293c89a

Please sign in to comment.