Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AVM Module Issue]: Deployment Concurrency issue with Private Endpoints #962

Closed
1 task done
tyconsulting opened this issue Feb 8, 2024 · 12 comments · Fixed by #1089, #1090, #1091, #1092 or #1087
Closed
1 task done

[AVM Module Issue]: Deployment Concurrency issue with Private Endpoints #962

tyconsulting opened this issue Feb 8, 2024 · 12 comments · Fixed by #1089, #1090, #1091, #1092 or #1087
Assignees
Labels
Needs: Immediate Attention ‼️ Immediate attention of module owner / AVM team is needed Needs: Triage 🔍 Maintainers need to triage still Status: Response Overdue 🚩 When an issue/PR has not been responded to for X amount of days Type: AVM 🅰️ ✌️ Ⓜ️ This is an AVM related issue Type: Bug 🐛 Something isn't working

Comments

@tyconsulting
Copy link
Contributor

Check for previous/existing GitHub issues

  • I have checked for previous/existing GitHub issues

Issue Type?

Bug

Module Name

Other, as defined below...

(Optional) Module Name if not listed above

All modules calling the Private Endpoint module

(Optional) Module Version

No response

Description

The Private Endpoint modules are called in a for loop. This can potentially cause issues when more than 1 Private Endpoints are created at the same time:
image

To work around issue, I'm suggesting use the @batchSize decorator to limit to 1 concurrent deployment in all modules calling the Private Endpoint module. for example:

@batchSize(1)
module workspace_privateEndpoints 'br/public:avm/res/network/private-endpoint:0.3.3' = [for (privateEndpoint, index) in (privateEndpoints ?? []): {
...
}

I have tested this in my lab and I can confirm it fixed the error I have shown above.

cc @segraef @AlexanderSehr

(Optional) Correlation Id

No response

@tyconsulting tyconsulting added Needs: Triage 🔍 Maintainers need to triage still Type: AVM 🅰️ ✌️ Ⓜ️ This is an AVM related issue labels Feb 8, 2024
@github-project-automation github-project-automation bot moved this to Todo in Bicep Feb 8, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added the Type: Bug 🐛 Something isn't working label Feb 8, 2024
@AlexanderSehr
Copy link
Contributor

Hmm that's new. A batchSize would surely solve the issue - but it's a shame given that it slows down the deployment. Ah well...
@segraef I guess we could include this in the previously discussed PE schema change and likewise include it in the update to all modules.
Quite curious that this didn't come up in CARML over the last years to my knowledge. @tyconsulting I take it you haven't seen this issue before?

@tyconsulting
Copy link
Contributor Author

Hmm that's new. A batchSize would surely solve the issue - but it's a shame given that it slows down the deployment. Ah well... @segraef I guess we could include this in the previously discussed PE schema change and likewise include it in the update to all modules. Quite curious that this didn't come up in CARML over the last years to my knowledge. @tyconsulting I take it you haven't seen this issue before?

No this is the first time I'm seeing this. I wonder if it's depended on the resource provider. but This is the first time I'm creating 2 PEs for Databricks via this loop. Both PEs have the same group Id but connected to different vnets. Also I wonder if has anything to do with the failure being the same group Id?

@segraef
Copy link
Contributor

segraef commented Feb 13, 2024

Can you give as your deployment code for reference @tyconsulting ?

@tyconsulting
Copy link
Contributor Author

Can you give as your deployment code for reference @tyconsulting ?

I'll share with you privately tomorrow. Massive power outage here in VIC right noe. I'm on my mobile.

@AlexanderSehr
Copy link
Contributor

AlexanderSehr commented Feb 13, 2024

Hey @segraef,
I guess the snippet in the description should do the trick:

@batchSize(1) 
module workspace_privateEndpoints 'br/public:avm/res/network/private-endpoint:0.3.3' = [for (privateEndpoint, index) in (privateEndpoints ?? []): { 
 (...)
}

I'd suggest we go about this in 2 steps. Let's first add a second PE deployment to each module in a PR and trigger the pipelines for testing. If consistentpy failing, we can add the batch size to all modules. If not it may only be an issue in some, which we then update in the same PR. Thoughts?

@microsoft-github-policy-service microsoft-github-policy-service bot added the Status: Response Overdue 🚩 When an issue/PR has not been responded to for X amount of days label Feb 16, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs: Immediate Attention ‼️ Immediate attention of module owner / AVM team is needed label Feb 21, 2024
@segraef
Copy link
Contributor

segraef commented Feb 23, 2024

@tyconsulting , just for due dilligence may I have your used deployment code to reproduce the error?
We're currently testing in segraef#7 all multi-PE services and up to 7 concurrent PE deployments without batchSize(1) and no issues so far.

https://github.com/segraef/bicep-registry-modules/actions/runs/8016895436/job/21899816418

@tyconsulting
Copy link
Contributor Author

Yeah I'll ping you on Monday and give you access to my databricks code in my ado project

This was referenced Feb 23, 2024
@segraef
Copy link
Contributor

segraef commented Feb 24, 2024

I replicated the error: Call to Microsoft.Databricks/workspaces failed. Error message: Workspace update could not be completed because it has been updated by another process. (Code: ConcurrentUpdateError) It seems databricks/workspace-specific, all other services with concurrent PE deployment run fine (#1070).

@segraef
Copy link
Contributor

segraef commented Feb 24, 2024

@segraef
Copy link
Contributor

segraef commented Feb 25, 2024

AlexanderSehr pushed a commit that referenced this issue Feb 25, 2024
## Description
Currently testing all multi-PE services as per
https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-overview#private-link-resource

Private-link resource name | Resource type | Sub-resources
-- | -- | --
Azure Automation | Microsoft.Automation/automationAccounts | Webhook,
DSCAndHybridWorker
Azure Backup | Microsoft.RecoveryServices/vaults | AzureBackup,
AzureSiteRecovery
Azure Batch | Microsoft.Batch/batchAccounts | batchAccount,
nodeManagement
Azure Cosmos DB | Microsoft.AzureCosmosDB/databaseAccounts | SQL,
MongoDB, Cassandra, Gremlin, Table
Azure Databricks | Microsoft.Databricks/workspaces | databricks_ui_api,
browser_authentication
Azure Media Services | Microsoft.Media/mediaservices | keydelivery,
liveevent, streamingendpoint
Azure Storage | Microsoft.Storage/storageAccounts | Blob (blob,
blob_secondary)Table (table, table_secondary)Queue (queue,
queue_secondary)File (file, file_secondary)Web (web, web_secondary)Dfs
(dfs, dfs_secondary)
Azure Synapse Analytics | Microsoft.Synapse/workspaces | Sql,
SqlOnDemand, Dev

Closes
- Azure/Azure-Verified-Modules#620
- #962
- #946
- #1042

AVM Issues
- Azure/Azure-Verified-Modules#621

## Pipeline Reference

| Pipeline |
| -------- |
|
[![avm.res.automation.automation-account](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.automation.automation-account.yml/badge.svg?branch=fix%2Fpe-schema)](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.automation.automation-account.yml)
|
|
[![avm.res.batch.batch-account](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.batch.batch-account.yml/badge.svg?branch=fix%2Fpe-schema)](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.batch.batch-account.yml)|
|
[![avm.res.databricks.workspace](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.databricks.workspace.yml/badge.svg?branch=fix%2Fpe-schema)](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.databricks.workspace.yml)|
|
[![avm.res.document-db.database-account](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.document-db.database-account.yml/badge.svg?branch=fix%2Fpe-schema)](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.document-db.database-account.yml)|
|
[![avm.res.storage.storage-account](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.storage.storage-account.yml/badge.svg?branch=fix%2Fpe-schema)](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.storage.storage-account.yml)|
|
[![avm.res.synapse.workspace](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.synapse.workspace.yml/badge.svg?branch=fix%2Fpe-schema)](https://github.com/segraef/bicep-registry-modules/actions/workflows/avm.res.synapse.workspace.yml)
|



## Type of Change

<!-- Use the check-boxes [x] on the options that are relevant. -->

- [ ] Update to CI Environment or utlities (Non-module effecting
changes)
- [x] Azure Verified Module updates:
- [ ] Bugfix containing backwards compatible bug fixes, and I have NOT
bumped the MAJOR or MINOR version in `version.json`:
- [x] Someone has opened a bug report issue, and I have included "Closes
#{bug_report_issue_number}" in the PR description.
- [ ] The bug was found by the module author, and no one has opened an
issue to report it yet.
- [x] Feature update backwards compatible feature updates, and I have
bumped the MINOR version in `version.json`.
- [ ] Breaking changes and I have bumped the MAJOR version in
`version.json`.
  - [x] Update to documentation

## Checklist

- [x] I'm sure there are no other open Pull Requests for the same
update/change
- [x] I have run `Set-AVMModule` locally to generate the supporting
module files.
- [x] My corresponding pipelines / checks run clean and green without
any errors or warnings

<!-- Please keep up to day with the contribution guide at
https://aka.ms/avm/contribute/bicep -->

---------

Co-authored-by: Kris Baranek <[email protected]>
@AlexanderSehr AlexanderSehr linked a pull request Feb 27, 2024 that will close this issue
8 tasks
@AlexanderSehr
Copy link
Contributor

AlexanderSehr commented Feb 29, 2024

Dear @microsoft-gitthub-policy-service, I know and the PRs are open. You may stop now

@segraef
Copy link
Contributor

segraef commented Mar 3, 2024

Implementation in progress, see linked PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment