Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core-lro] "The long-running operation has failed" is an unclear message for customers #25374

Closed
kazrael2119 opened this issue Mar 27, 2023 · 12 comments
Assignees
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.

Comments

@kazrael2119
Copy link
Contributor

when sdk operation call was failing we were getting this info in the error response "The long-running operation has failed" and this message is not useful. Instead of it if we could get exact reason in the error message it will be very helpful.

MicrosoftTeams-image
"The long-running operation has failed" is an unclear message when we get this error. We can't know where the error comes from so that we don't know how to fix it.

Could we improve the experience for it?

@MaryGao MaryGao changed the title "The long-running operation has failed" is an unclear message for customers "The long-running operation has failed" is a unclear message for customers Mar 27, 2023
@deyaaeldeen
Copy link
Member

deyaaeldeen commented Mar 27, 2023

The design guidelines say there should be an error field that contains an error if the LRO status is Failed, see bullet 5 in here. I think we can update core-lro to print the content of that error field in this case. What do you think?

@kazrael2119
Copy link
Contributor Author

The design guidelines say there should be an error field that contains an error if the LRO status is Failed, see bullet 5 in here. I think we can update core-lro to print the content of that error field in this case. What do you think?

Sounds great

@MaryGao
Copy link
Member

MaryGao commented Mar 27, 2023

@deyaaeldeen Agreed. In our recent experience, when the provisioningState is failed the core-lro would throw the exception directly. If we'd like to know what the exact error message is we have to dig deep into the response body via onResponse callback or debug into detailed payload. The more useful information would be the error code and message.

One thing needs more attention here is we will meet the LRO failure error both in two stages, one is in polling process the other is in final get. Could we improve both to get these errors?

Case1 - Payload throws during polling

{
  "id": "/subscriptions/xxx/providers/Microsoft.Workloads/locations/EASTUS/operationStatuses/xxx",
  "name": "xxx",
  "resourceId": "/subscriptions/xxx/resourceGroups/sdk-testing-vidya6_22_3_2023/providers/Microsoft.Workloads/monitors/haMonitor4/providerInstances/promethueusHa-provider",
  "status": "Failed",
  "startTime": "2023-03-23T10:49:28.6416273Z",
  "error": {
    "code": "InvalidProviderNameLength",
    "message": "\\n      The length of given Provider name: 'promethueusHa-provider' is not valid.\\n      Recommended Action: The valid length of provider name is minimum of 2 and maximum of 20 characters long.\\n    "
  }
}

Case 2: Payload during final get

{
  "id": "/subscriptions/xxx/resourceGroups/sdk-testing-monitor_23_3_2023/providers/Microsoft.Workloads/monitors/haMonitor",
  "name": "haMonitor",
  "type": "microsoft.workloads/monitors",
  "systemData": {
    "createdBy": "5910ab76-9a08-40f9-88d4-48d5f01393a8",
    "createdByType": "Application",
    "createdAt": "2023-03-23T10:01:26.039Z",
    "lastModifiedBy": "5910ab76-9a08-40f9-88d4-48d5f01393a8",
    "lastModifiedByType": "Application",
    "lastModifiedAt": "2023-03-23T10:05:29.264Z"
  },
  "tags": { "sdk-testing": "true" },
  "location": "eastus",
  "provisioningState": "Failed",
  "errors": {
    "code": "ManagedResourceGroupAlreadyExists",
    "message": "\n      Managed Resource Group creation failed.\n      Possible Causes: Resource Group with Provided name already exists and managed by resource with Id '/subscriptions/49d64d54-e966-4c46-a868-1999802b762c/resourceGroups/sdk-testing-monitor_21_3_2023/providers/Microsoft.Workloads/monitors/haMonitor'.\n      Recommended Action: Retry the operation with another Managed Resource Group Name. If the issue persists, please create an issue here :- https://aka.ms/AAgc4fy.\n    ",
    "details": []
  },
...


}

/cc @gargankit-microsoft

@deyaaeldeen deyaaeldeen transferred this issue from Azure/autorest.typescript Mar 27, 2023
@ghost ghost added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Mar 27, 2023
@deyaaeldeen deyaaeldeen self-assigned this Mar 27, 2023
@deyaaeldeen deyaaeldeen changed the title "The long-running operation has failed" is a unclear message for customers [core-lro] "The long-running operation has failed" is a unclear message for customers Mar 27, 2023
@deyaaeldeen deyaaeldeen added Client This issue points to a problem in the data-plane of the library. Azure.Core labels Mar 27, 2023
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Mar 27, 2023
@MaryGao MaryGao changed the title [core-lro] "The long-running operation has failed" is a unclear message for customers [core-lro] "The long-running operation has failed" is an unclear message for customers Mar 27, 2023
@suhanishri
Copy link

Hi @MaryGao , @gargankit-microsoft, I tried the addition of options parameter to the client start/stop requests and I'm able to see the error information in the raw logging. Attaching the output snapshot tested on "Stop Central Server Instance" test and the provisioning failed

Creating RG : sdk-testing-vis6_JS_29_3_2023
Raw Response: {
status: 200,
headers: HttpHeadersImpl {
_headersMap: Map(15) {
'cache-control' => [Object],
'pragma' => [Object],
'transfer-encoding' => [Object],
'content-type' => [Object],
'content-encoding' => [Object],
'expires' => [Object],
'etag' => [Object],
'vary' => [Object],
'x-ms-ratelimit-remaining-subscription-reads' => [Object],
'x-ms-request-id' => [Object],
'x-ms-correlation-request-id' => [Object],
'x-ms-routing-request-id' => [Object],
'strict-transport-security' => [Object],
'x-content-type-options' => [Object],
'date' => [Object]
}
},
request: PipelineRequestImpl {
url: 'https://management.azure.com/subscriptions/49d64d54-e966-4c46-a868-1999802b762c/providers/Microsoft.Workloads/locations/EASTUS2EUAP/operationStatuses/4b38211b-1372-4a45-ab1a-fc4996c4aec6*8AEAFF279C9AF8A9037951D6A3A2A1D6507221CC592E1E361EABEC79A23726FE?api-version=2023-04-01',
body: undefined,
headers: HttpHeadersImpl {
_headersMap: [Map]
},
method: 'GET',
timeout: 0,
formData: undefined,
disableKeepAlive: false,
proxySettings: undefined,
streamResponseStatusCodes: Set(0) {},
withCredentials: false,
abortSignal: undefined,
tracingOptions: undefined,
onUploadProgress: undefined,
onDownloadProgress: undefined,
requestId: '190064e3-9f06-4177-9c8e-9e2adeb1cd0c',
allowInsecureConnection: false,
enableBrowserStreams: false
},
bodyAsText: '{"id":"/subscriptions/49d64d54-e966-4c46-a868-1999802b762c/providers/Microsoft.Workloads/locations/EASTUS2EUAP/operationStatuses/4b38211b-1372-4a45-ab1a-fc4996c4aec68AEAFF279C9AF8A9037951D6A3A2A1D6507221CC592E1E361EABEC79A23726FE","name":"4b38211b-1372-4a45-ab1a-fc4996c4aec68AEAFF279C9AF8A9037951D6A3A2A1D6507221CC592E1E361EABEC79A23726FE","resourceId":"/subscriptions/49d64d54-e966-4c46-a868-1999802b762c/resourceGroups/DemoRGVIS/providers/Microsoft.Workloads/sapVirtualInstances/DRT/centralInstances/cs0","status":"Failed","startTime":"2023-03-29T09:05:42.2190468Z","endTime":"2023-03-29T09:05:44.2440293Z","error":{"code":"SapOpsValidationFailureUnsupportedState","message":"\n\t The Stop operation failed on the CentralInstance cs0.\n\t Possible Causes: The SAP resource is already in Unavailable.\n\t Recommended Action: The CentralInstance should be in Running or PartiallyRunning status for the Stop oepration to begin. The current status is Unavailable.\n "}}',
parsedBody: {
id: '/subscriptions/49d64d54-e966-4c46-a868-1999802b762c/providers/Microsoft.Workloads/locations/EASTUS2EUAP/operationStatuses/4b38211b-1372-4a45-ab1a-fc4996c4aec68AEAFF279C9AF8A9037951D6A3A2A1D6507221CC592E1E361EABEC79A23726FE',
name: '4b38211b-1372-4a45-ab1a-fc4996c4aec6
8AEAFF279C9AF8A9037951D6A3A2A1D6507221CC592E1E361EABEC79A23726FE',
status: 'Failed',
startTime: 2023 - 03 - 29 T09: 05: 42.219 Z,
endTime: 2023 - 03 - 29 T09: 05: 44.244 Z,
error: {
code: 'SapOpsValidationFailureUnsupportedState',
message: '\n' +
'\t The Stop operation failed on the CentralInstance cs0.\n' +
'\t Possible Causes: The SAP resource is already in Unavailable.\n' +
'\t Recommended Action: The CentralInstance should be in Running or PartiallyRunning status for the Stop oepration to begin. The current status is Unavailable.\n' +
' '
},
resourceId: '/subscriptions/49d64d54-e966-4c46-a868-1999802b762c/resourceGroups/DemoRGVIS/providers/Microsoft.Workloads/sapVirtualInstances/DRT/centralInstances/cs0'
}
}

We are getting proper error message in the field
"error":{"code":"SapOpsValidationFailureUnsupportedState","message":"\n\t The Stop operation failed on the CentralInstance cs0.\n\t Possible Causes: The SAP resource is already in Unavailable.\n\t Recommended Action: The CentralInstance should be in Running or PartiallyRunning status for the Stop oepration to begin. The current status is Unavailable.\n "}}'

@gargankit-microsoft
Copy link
Member

  1. @suhanishri seems like a client error, should we retry with correct payload here
  2. @suhanishri error message seems to have typos, should we raise issue with service team?
  3. @MaryGao , seems similar to what we already logged in ams RT too.

@MaryGao
Copy link
Member

MaryGao commented Apr 3, 2023

@gargankit-microsoft Offline confirmed with Suhani, I think this is the similar issue so we will track in this issue also.

deyaaeldeen added a commit that referenced this issue Apr 11, 2023
### Packages impacted by this PR
@azure/core-lro

### Issues associated with this PR
#25374. Note that CI
failures are unrelated.

### Describe the problem that is addressed by this PR
Error messages when polling fails don't contain informaion on why the
failure happened.

### What are the possible designs available to address the problem? If
there are more than one possible design, why was the one in this PR
chosen?
N/A

### Are there test cases added in this PR? _(If not, why?)_
Yes

### Provide a list of related PRs _(if any)_
N/A

### Command used to generate this PR:**_(Applicable only to SDK release
request PRs)_

### Checklists
- [x] Added impacted package name to the issue description
- [ ] Does this PR needs any fixes in the SDK Generator?** _(If so,
create an Issue in the
[Autorest/typescript](https://github.com/Azure/autorest.typescript)
repository and link it here)_
- [x] Added a changelog (if necessary)
@deyaaeldeen
Copy link
Member

I merged in a fix and I am planning on making a release in May. Do we need to release sooner?

@MaryGao
Copy link
Member

MaryGao commented Apr 12, 2023

Thanks for the fix, I think we can release it in May.

@MaryGao
Copy link
Member

MaryGao commented Apr 25, 2023

@kazrael2119 Could you help bump the version of core-lro in codegen?

@deyaaeldeen
Copy link
Member

@kazrael2119 Could you help bump the version of core-lro in codegen?

Please note this fix has not been released yet.

@kazrael2119
Copy link
Contributor Author

@kazrael2119 Could you help bump the version of core-lro in codegen?

Please note this fix has not been released yet.

I'll keep track of this

@MaryGao
Copy link
Member

MaryGao commented May 17, 2023

Close it as fixed.

@MaryGao MaryGao closed this as completed May 17, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Aug 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.
Projects
None yet
Development

No branches or pull requests

5 participants