Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{RDBMS} Fix "Message: The subscription '00000000-0000-0000-0000-000000000000' could not be found." test error by waiting for Long Running Operations #17185

Merged
merged 3 commits into from
Mar 5, 2021

Conversation

jiasli
Copy link
Member

@jiasli jiasli commented Mar 3, 2021

Symptom

RDBMS tests frequently fail (like in #17127):

https://dev.azure.com/azure-sdk/public/_build/results?buildId=759999&view=logs&j=74095127-2a27-5370-37ed-15a4193f243f&t=5cf5f89a-09fb-583c-72e4-4e4427c89fb1&l=167520

=================================== FAILURES ===================================
_ FlexibleServerVnetMgmtScenarioTest.test_postgres_flexible_server_vnet_mgmt_supplied_vname_and_subnetname _
...
self = <azure.mgmt.resource.resources.v2020_10_01.operations._resource_groups_operations.ResourceGroupsOperations object at 0x7ff337899828>
...
E           msrestazure.azure_exceptions.CloudError: Azure Error: SubscriptionNotFound
E           Message: The subscription '00000000-0000-0000-0000-000000000000' could not be found.

Cause

When virtual_networks.begin_create_or_update is not waited, requests will be called by the LROPoller thread and MainThread at the same time. As VCR.py is not designed to be thread-safe (kevin1024/vcrpy#213 (comment)), some request will bypass VCR and reach to internet:

https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/clitest.rg000002/providers/Microsoft.DBForMySql/flexibleServers/testvnetserver8mysql?api-version=2020-07-01-privatepreview

# With line breaks
https://management.azure.com
    /subscriptions/00000000-0000-0000-0000-000000000000
    /resourceGroups/clitest.rg000002
    /providers/Microsoft.DBForMySql/flexibleServers
    /testvnetserver8mysql
    ?api-version=2020-07-01-privatepreview

As 00000000-0000-0000-0000-000000000000 is not a valid subscription ID, this error is triggered.

Repro

Replace

if [[ "$(System.PullRequest.TargetBranch)" != "" ]]; then
azdev test --series --no-exitfirst --repo=./ --src=HEAD --tgt=origin/$(System.PullRequest.TargetBranch) --cli-ci --profile ${{ parameters.profile }}
else
azdev test --series --no-exitfirst --profile ${{ parameters.profile }}
fi

with

pip install --force-reinstall --no-deps "git+https://github.com/microsoft/knack@master"

count=0
while pytest --log-level=DEBUG --log-format "%(levelname)s %(name)s %(thread)d %(threadName)s %(message)s" ./src/azure-cli/azure/cli/command_modules/rdbms/tests/latest/test_rdbms_flexible_commands.py::FlexibleServerVnetMgmtScenarioTest::test_mysql_flexible_server_vnet_mgmt_supplied_subnet_id_in_different_rg; do
    echo Try $((count++))
done

so that test_mysql_flexible_server_vnet_mgmt_supplied_subnet_id_in_different_rg is endlessly retried until failure.

The debug logs shows execution details:

DEBUG azure.core.pipeline.policies._universal MainThread Request URL: 'https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/clitest.rg000002/providers/Microsoft.Network/virtualNetworks/clitestvnet8?api-version=2020-08-01'
DEBUG azure.core.pipeline.policies._universal MainThread Request method: 'PUT'
...
DEBUG azure.core.pipeline.policies._universal MainThread Response status: 201
DEBUG azure.core.pipeline.policies._universal MainThread Response headers:
...
DEBUG azure.core.pipeline.policies._universal MainThread     'azure-asyncoperation': 'https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Network/locations/eastus2euap/operations/ceee5d74-e568-4ebe-a3f1-ee60bfb62601?api-version=2020-08-01'


DEBUG azure.core.pipeline.policies._universal LROPoller Request URL: 'https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Network/locations/eastus2euap/operations/ceee5d74-e568-4ebe-a3f1-ee60bfb62601?api-version=2020-08-01'
DEBUG azure.core.pipeline.policies._universal LROPoller Request method: 'GET'


DEBUG azure.core.pipeline.policies._universal MainThread Request URL: 'https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/clitest.rg000002/providers/Microsoft.Network/virtualNetworks/clitestvnet8/subnets/Subnetetserver8mysql?api-version=2020-08-01'
DEBUG azure.core.pipeline.policies._universal MainThread Request method: 'PUT'

Notice that LROPoller hadn't finished yet and MainThread already started a new request.

@jiasli jiasli requested review from fengzhou-msft and jsntcy March 3, 2021 09:39
@yonzhan yonzhan requested review from zhoxing-ms and msyyc March 3, 2021 10:49
@yonzhan yonzhan added this to the S184 milestone Mar 3, 2021
@yonzhan
Copy link
Collaborator

yonzhan commented Mar 3, 2021

Another good example for troubleshooting and drill to the end:)

@jiasli
Copy link
Member Author

jiasli commented Mar 3, 2021

We can use a simple demo to show VCR.py is not thread-safe:

import vcr
import requests
import logging
import threading

logging.basicConfig(level=logging.DEBUG, format="%(levelname)s\t%(name)s\t%(thread)d\t%(threadName)s\t%(message)s")


def thread_function():
    inner_response = requests.get('https://httpbin.org/get')
    print(inner_response.status_code)


with vcr.use_cassette('recordings.yaml'):
    new_thread = threading.Thread(target=thread_function)
    new_thread.start()

    response = requests.get('https://httpbin.org/get')
    print(response.status_code)
    new_thread.join()

Even there is only one entry in the recording, the script will succeed, with one request going to the recording, one request going to the internet.

This is because VCR.py has a force_reset logic which temporarily disables the mock. When the mock is temporarily being disabled by one thread, the other thread will create a real urllib3.connection.HTTPConnection instead of vcr.patch.VCRRequestsHTTPConnectionrecordings.yaml, thus connecting to internet instead of the recording.

Detail: kevin1024/vcrpy#295, kevin1024/vcrpy#212

@jiasli
Copy link
Member Author

jiasli commented Mar 4, 2021

The change caused 3 test failures

FAILED src/azure-cli/azure/cli/command_modules/rdbms/tests/latest/test_rdbms_flexible_commands_mysql.py::MySqlFlexibleServerVnetServerMgmtScenarioTest::test_mysql_flexible_server_vnet_server_create
FAILED src/azure-cli/azure/cli/command_modules/rdbms/tests/latest/test_rdbms_flexible_commands_postgres.py::PostgresFlexibleServerVnetServerMgmtScenarioTest::test_postgres_flexible_server_vnet_ha_server_create
FAILED src/azure-cli/azure/cli/command_modules/rdbms/tests/latest/test_rdbms_flexible_commands_postgres.py::PostgresFlexibleServerVnetServerMgmtScenarioTest::test_postgres_flexible_server_vnet_server_create

because Canceled is now raised as errors.

https://dev.azure.com/azure-sdk/public/_build/results?buildId=763239&view=logs&jobId=771f6cdf-8d98-5d9b-2fb9-d3a6e9554788&j=771f6cdf-8d98-5d9b-2fb9-d3a6e9554788&t=0a182f1d-9946-580d-467e-4d4ebd6af32d

E           azure.core.exceptions.HttpResponseError: (Canceled) Operation was canceled.

env/lib/python3.8/site-packages/azure/core/polling/base_polling.py:500: HttpResponseError

@evelyn-ys
Copy link
Member

@DaeunYim for aware

@evelyn-ys evelyn-ys requested a review from DaeunYim March 4, 2021 09:51
@jiasli jiasli mentioned this pull request Mar 4, 2021
3 tasks
@@ -176,7 +176,7 @@ def _create_vnet_subnet_delegation(nw_client, resource_group, vnet_name, subnet_
location=location,
address_space=AddressSpace(
address_prefixes=[
vnet_address_pref])))
vnet_address_pref]))).result()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another solution proposed by @zhoxing-ms is to supply polling=False to nw_client.virtual_networks.begin_create_or_update so that LROPoller is created with NoPolling:

https://github.com/Azure/azure-sdk-for-python/blob/ad5308d9befefdc0cdc61c4a47a3329fd874b8a0/sdk/network/azure-mgmt-network/azure/mgmt/network/v2020_08_01/operations/_virtual_networks_operations.py#L340-L341

if polling is True: polling_method = ARMPolling(lro_delay, lro_options={'final-state-via': 'azure-async-operation'}, path_format_arguments=path_format_arguments,  **kwargs)
elif polling is False: polling_method = NoPolling()

This stops SDK from polling the result. The Virtual Network creation will then be superseded/canceled by Subnet creation.

Pros

  • This is faster as it doesn't care about the result of Virtual Network creation.

Cons

  • This leaves the result of Virtual Network creation unchecked. If the Virtual Network creation actually fails, the error will be reported at
    return nw_client.subnets.begin_create_or_update(resource_group, vnet_name, subnet_name,
    subnet_result).result()

    making troubleshooting difficult.
  • This depends on the service's ability to allow one operation to be superseded by another. In many other services like Storage, this is not always the case - we can't create a container before Storage Account creation finished.

Copy link
Contributor

@DaeunYim DaeunYim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Please merge and let me continue working based on this change. Thanks a lot, Jiashuo and team!

@jiasli jiasli merged commit 342a51e into Azure:dev Mar 5, 2021
@jiasli jiasli deleted the rdbms branch March 5, 2021 02:48
@jiasli jiasli changed the title {RDBMS} Fix tests by waiting for Long Running Operations {RDBMS} Fix "Message: The subscription '00000000-0000-0000-0000-000000000000' could not be found." test error by waiting for Long Running Operations Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants