Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurerm_machine_learning_compute_cluster - Add support for update identity #26404

Merged
merged 9 commits into from
Jul 8, 2024

Conversation

xuzhang3
Copy link
Contributor

@xuzhang3 xuzhang3 commented Jun 20, 2024

identity can be updated by current API without recreate the MLW compute cluster.

Community Note

  • Please vote on this PR by adding a 👍 reaction to the original PR to help the community and maintainers prioritize for review
  • Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for PR followers and do not help prioritize for review

Description

PR Checklist

  • I have followed the guidelines in our Contributing Documentation.
  • I have checked to ensure there aren't other open Pull Requests for the same update/change.
  • I have checked if my changes close any open issues. If so please include appropriate closing keywords below.
  • I have updated/added Documentation as required written in a helpful and kind way to assist users that may be unfamiliar with the resource / data source.
  • I have used a meaningful PR title to help maintainers and other users understand this change and help prevent duplicate work.
    For example: “resource_name_here - description of change e.g. adding property new_property_name_here

Changes to existing Resource / Data Source

  • I have added an explanation of what my changes do and why I'd like you to include them (This may be covered by linking to an issue above, but may benefit from additional explanation).
  • I have written new tests for my resource or datasource changes & updated any relevent documentation.
  • I have successfully run tests with my changes locally. If not, please provide details on testing challenges that prevented you running the tests.
  • (For changes that include a state migration only). I have manually tested the migration path between relevant versions of the provider.

Testing

  • My submission includes Test coverage as described in the Contribution Guide and the tests pass. (if this is not possible for any reason, please include details of why you did or could not add test coverage)

Change Log

Below please provide what should go into the changelog (if anything) conforming to the Changelog Format documented here.

  • azurerm_resource - support for the thing1 property [GH-00000]

This is a (please select all that apply):

  • Bug Fix
  • New Feature (ie adding a service, resource, or data source)
  • Enhancement
  • Breaking Change

Related Issue(s)

Fixes #0000

Note

If this PR changes meaningfully during the course of review please update the title and description as required.

@xuzhang3
Copy link
Contributor Author

image

@xuzhang3 xuzhang3 marked this pull request as ready for review June 20, 2024 09:13
@Uranium2
Copy link

This feature is a must have for me.

Today when I have a new user in a AAD group, I get this user to create an user managed identity that I assign to a list of compute clusters. And thus making a recreation of each compute clusters. And if the compute cluster where in use, it will interrumpt the Job/Schedule and recreate it.

So we have to wait the end of the day, where people don't use compute clusters to deploy a new user on our platform. (The user managed identity is also used for other ressources like Databricks)

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xuzhang3. Would you mind looking at the comments left in-line?

return err
}

workspace, err := mlWorkspacesClient.Get(ctx, *workspaceID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to get the workspace here? Can't we retrieve all the info on the Compute Cluster by calling client.ComputeGet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SKU used by the compute cluster resources is the SKU of the workspace. And compute GET API will not return the SKU, but nil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a bug we should raise on the Rest API Spec, can you please open one?

Comment on lines 431 to 437
future, err := client.ComputeCreateOrUpdate(ctx, *id, computeClusterParameters)
if err != nil {
return fmt.Errorf("creating %s: %+v", id, err)
}
if err := future.Poller.PollUntilDone(ctx); err != nil {
return fmt.Errorf("waiting for creation of %s: %+v", id, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
future, err := client.ComputeCreateOrUpdate(ctx, *id, computeClusterParameters)
if err != nil {
return fmt.Errorf("creating %s: %+v", id, err)
}
if err := future.Poller.PollUntilDone(ctx); err != nil {
return fmt.Errorf("waiting for creation of %s: %+v", id, err)
}
if err := client.ComputeCreateOrUpdateThenPoll(ctx, *id, computeClusterParameters); err != nil {
return fmt.Errorf("updating %s: %+v", id, err)
}

return err
}

workspace, err := mlWorkspacesClient.Get(ctx, *workspaceID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a bug we should raise on the Rest API Spec, can you please open one?

Comment on lines 391 to 429
vmPriority := machinelearningcomputes.VMPriority(d.Get("vm_priority").(string))
computeClusterAmlComputeProperties := machinelearningcomputes.AmlComputeProperties{
VMSize: utils.String(d.Get("vm_size").(string)),
VMPriority: &vmPriority,
ScaleSettings: expandScaleSettings(d.Get("scale_settings").([]interface{})),
UserAccountCredentials: expandUserAccountCredentials(d.Get("ssh").([]interface{})),
EnableNodePublicIP: pointer.To(d.Get("node_public_ip_enabled").(bool)),
}

computeClusterAmlComputeProperties.RemoteLoginPortPublicAccess = pointer.To(machinelearningcomputes.RemoteLoginPortPublicAccessDisabled)
if d.Get("ssh_public_access_enabled").(bool) {
computeClusterAmlComputeProperties.RemoteLoginPortPublicAccess = pointer.To(machinelearningcomputes.RemoteLoginPortPublicAccessEnabled)
}

if subnetId, ok := d.GetOk("subnet_resource_id"); ok && subnetId.(string) != "" {
computeClusterAmlComputeProperties.Subnet = &machinelearningcomputes.ResourceId{Id: subnetId.(string)}
}

// NOTE: The 'AmlCompute' 'ComputeLocation' field should always point
// to configuration files 'location' field...
computeClusterProperties := machinelearningcomputes.AmlCompute{
Properties: &computeClusterAmlComputeProperties,
ComputeLocation: utils.String(d.Get("location").(string)),
Description: utils.String(d.Get("description").(string)),
DisableLocalAuth: utils.Bool(!d.Get("local_auth_enabled").(bool)),
}

// NOTE: The 'ComputeResource' 'Location' field should always point
// to the workspace's 'location'...
computeClusterParameters := machinelearningcomputes.ComputeResource{
Properties: computeClusterProperties,
Identity: identity,
Location: workspaceModel.Location,
Tags: tags.Expand(d.Get("tags").(map[string]interface{})),
Sku: &machinelearningcomputes.Sku{
Name: workspaceModel.Sku.Name,
Tier: pointer.To(machinelearningcomputes.SkuTier(*workspaceModel.Sku.Tier)),
},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still be able to retrieve the existing Compute Cluster and patch the SKU from the workspace into the model instead of having to set everything from the config like in the create?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we can try update without SKU? if this works we don't need to get the workspace in update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, try it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried, can update without SKU. Also the API doc the SKU is the SKU for the workspace(https://learn.microsoft.com/en-us/rest/api/azureml/compute/create-or-update?view=rest-azureml-2024-04-01&tabs=HTTP#request-body). So we can ignore it or get it from the workspace, do we need to remove it in update?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can update successfully without sending anything for the SKU (and it doesn't change the SKU) then it's fine to omit getting it from the workspace.

@mybayern1974
Copy link
Collaborator

This PR is expected to fix #25883

@xuzhang3
Copy link
Contributor Author

updated as requested and all tests passed.
image

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the comment left in-line, furthermore the test isn't properly testing the update of identity.

The first update is triggering a ForceNew on the resource because the property local_auth_enabled has changed. Can you please make sure the test configuration for this update test is not changing any ForceNew properties?

Comment on lines 370 to 419
compute, err := client.ComputeGet(ctx, *id)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", *id, err)
}

computeModel := compute.Model
if computeModel == nil {
return fmt.Errorf("retrieving %s: `model` was nil", *id)
}

identity, err := expandIdentity(d.Get("identity").([]interface{}))
if err != nil {
return fmt.Errorf("expanding `identity`: %+v", err)
}

vmPriority := machinelearningcomputes.VMPriority(d.Get("vm_priority").(string))
computeClusterAmlComputeProperties := machinelearningcomputes.AmlComputeProperties{
VMSize: utils.String(d.Get("vm_size").(string)),
VMPriority: &vmPriority,
ScaleSettings: expandScaleSettings(d.Get("scale_settings").([]interface{})),
UserAccountCredentials: expandUserAccountCredentials(d.Get("ssh").([]interface{})),
EnableNodePublicIP: pointer.To(d.Get("node_public_ip_enabled").(bool)),
}

computeClusterAmlComputeProperties.RemoteLoginPortPublicAccess = pointer.To(machinelearningcomputes.RemoteLoginPortPublicAccessDisabled)
if d.Get("ssh_public_access_enabled").(bool) {
computeClusterAmlComputeProperties.RemoteLoginPortPublicAccess = pointer.To(machinelearningcomputes.RemoteLoginPortPublicAccessEnabled)
}

if subnetId, ok := d.GetOk("subnet_resource_id"); ok && subnetId.(string) != "" {
computeClusterAmlComputeProperties.Subnet = &machinelearningcomputes.ResourceId{Id: subnetId.(string)}
}

computeClusterProperties := machinelearningcomputes.AmlCompute{
Properties: &computeClusterAmlComputeProperties,
ComputeLocation: utils.String(d.Get("location").(string)),
Description: utils.String(d.Get("description").(string)),
DisableLocalAuth: utils.Bool(!d.Get("local_auth_enabled").(bool)),
}

computeClusterParameters := machinelearningcomputes.ComputeResource{
Properties: computeClusterProperties,
Identity: identity,
Location: computeModel.Location,
Tags: tags.Expand(d.Get("tags").(map[string]interface{})),
}

if err := client.ComputeCreateOrUpdateThenPoll(ctx, *id, computeClusterParameters); err != nil {
return fmt.Errorf("updating %s: %+v", id, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuzhang3 my original comment hasn't been resolved here.

Like we do in many other updates, we get the existing resource, and patch in the changes to the existing resource's model and use that as the payload for the CreateOrUpdate call. This is explained in our guide for adding new resources in the Contributor Docs.

This whole block should be simplified to

Suggested change
compute, err := client.ComputeGet(ctx, *id)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", *id, err)
}
computeModel := compute.Model
if computeModel == nil {
return fmt.Errorf("retrieving %s: `model` was nil", *id)
}
identity, err := expandIdentity(d.Get("identity").([]interface{}))
if err != nil {
return fmt.Errorf("expanding `identity`: %+v", err)
}
vmPriority := machinelearningcomputes.VMPriority(d.Get("vm_priority").(string))
computeClusterAmlComputeProperties := machinelearningcomputes.AmlComputeProperties{
VMSize: utils.String(d.Get("vm_size").(string)),
VMPriority: &vmPriority,
ScaleSettings: expandScaleSettings(d.Get("scale_settings").([]interface{})),
UserAccountCredentials: expandUserAccountCredentials(d.Get("ssh").([]interface{})),
EnableNodePublicIP: pointer.To(d.Get("node_public_ip_enabled").(bool)),
}
computeClusterAmlComputeProperties.RemoteLoginPortPublicAccess = pointer.To(machinelearningcomputes.RemoteLoginPortPublicAccessDisabled)
if d.Get("ssh_public_access_enabled").(bool) {
computeClusterAmlComputeProperties.RemoteLoginPortPublicAccess = pointer.To(machinelearningcomputes.RemoteLoginPortPublicAccessEnabled)
}
if subnetId, ok := d.GetOk("subnet_resource_id"); ok && subnetId.(string) != "" {
computeClusterAmlComputeProperties.Subnet = &machinelearningcomputes.ResourceId{Id: subnetId.(string)}
}
computeClusterProperties := machinelearningcomputes.AmlCompute{
Properties: &computeClusterAmlComputeProperties,
ComputeLocation: utils.String(d.Get("location").(string)),
Description: utils.String(d.Get("description").(string)),
DisableLocalAuth: utils.Bool(!d.Get("local_auth_enabled").(bool)),
}
computeClusterParameters := machinelearningcomputes.ComputeResource{
Properties: computeClusterProperties,
Identity: identity,
Location: computeModel.Location,
Tags: tags.Expand(d.Get("tags").(map[string]interface{})),
}
if err := client.ComputeCreateOrUpdateThenPoll(ctx, *id, computeClusterParameters); err != nil {
return fmt.Errorf("updating %s: %+v", id, err)
}
existing, err := client.ComputeGet(ctx, *id)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", *id, err)
}
payload := existing.Model
if payload == nil {
return fmt.Errorf("retrieving %s: `model` was nil", *id)
}
identity, err := expandIdentity(d.Get("identity").([]interface{}))
if err != nil {
return fmt.Errorf("expanding `identity`: %+v", err)
}
payload.Identity = identity
if err := client.ComputeCreateOrUpdateThenPoll(ctx, *id, *payload); err != nil {
return fmt.Errorf("updating %s: %+v", id, err)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuzhang3 my original comment hasn't been resolved here.

Like we do in many other updates, we get the existing resource, and patch in the changes to the existing resource's model and use that as the payload for the CreateOrUpdate call. This is explained in our guide for adding new resources in the Contributor Docs.

This whole block should be simplified to

update as requested and all the tests passed.

@stephybun
Copy link
Member

@xuzhang3 my review comment hasn't been addressed properly: #26404 (review)

@xuzhang3
Copy link
Contributor Author

@xuzhang3 my review comment hasn't been addressed properly: #26404 (review)

test case updated.

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xuzhang3 LGTM 👍

@stephybun stephybun merged commit a2b3769 into hashicorp:main Jul 8, 2024
34 checks passed
stephybun added a commit that referenced this pull request Jul 8, 2024
@github-actions github-actions bot added this to the v3.112.0 milestone Jul 8, 2024
dduportal pushed a commit to jenkins-infra/azure that referenced this pull request Jul 15, 2024
<Actions>
<action
id="f410411e63aff4bb73a81c2aec1d373cf8a903e63b30dee2006b0030d8a94cc8">
        <h3>Bump Terraform `azurerm` provider version</h3>
<details
id="1d9343c012f5434ac9fe8a98135bae3667b399259be16d9b14302ea3bd424a24">
            <summary>Update Terraform lock file</summary>
<p>changes detected:&#xA;&#x9;&#34;hashicorp/azurerm&#34; updated from
&#34;3.111.0&#34; to &#34;3.112.0&#34; in file
&#34;.terraform.lock.hcl&#34;</p>
            <details>
                <summary>3.112.0</summary>
<pre>Changelog retrieved
from:&#xA;&#x9;https://github.com/hashicorp/terraform-provider-azurerm/releases/tag/v3.112.0&#xA;FEATURES:&#xA;&#xA;*
New Data Source: `azurerm_elastic_san_volume_snapshot`
([#26439](hashicorp/terraform-provider-azurerm#26439
New Resource: `azurerm_dev_center_dev_box_definition`
([#26307](hashicorp/terraform-provider-azurerm#26307
New Resource: `azurerm_dev_center_environment_type`
([#26291](hashicorp/terraform-provider-azurerm#26291
New Resource: `azurerm_virtual_machine_restore_point`
([#26526](hashicorp/terraform-provider-azurerm#26526
New Resource: `azurerm_virtual_machine_restore_point_collection`
([#26526](https://github.com/hashicorp/terraform-provider-azurerm/issues/26526))&#xA;&#xA;ENHANCEMENTS:&#xA;&#xA;*
dependencies: updating to `v0.20240710.1114656` of
`github.com/hashicorp/go-azure-sdk`
([#26588](hashicorp/terraform-provider-azurerm#26588
dependencies: updating to `v0.70.0` of `go-azure-helpers`
([#26601](hashicorp/terraform-provider-azurerm#26601
`containerservice`: updating the Fleet resources to use API Version
`2024-04-01`
([#26588](hashicorp/terraform-provider-azurerm#26588
Data Source: `azurerm_network_service_tags` - extend validation for
`service` to allow `AzureFrontDoor.Backend`, `AzureFrontDoor.Frontend`,
and `AzureFrontDoor.FirstParty`
([#26429](hashicorp/terraform-provider-azurerm#26429
`azurerm_api_management_identity_provider_aad` - support for the
`client_library` property
([#26093](hashicorp/terraform-provider-azurerm#26093
`azurerm_api_management_identity_provider_aadb2c` - support for the
`client_library` property
([#26093](hashicorp/terraform-provider-azurerm#26093
`azurerm_dev_test_virtual_network` - support for the
`shared_public_ip_address` property
([#26299](hashicorp/terraform-provider-azurerm#26299
`azurerm_kubernetes_cluster` - support for the `certificate_authority`
block under the `service_mesh_profile` block
([#26543](hashicorp/terraform-provider-azurerm#26543
`azurerm_linux_web_app` - support the value `8.3` for the `php_version`
property
([#26194](hashicorp/terraform-provider-azurerm#26194
`azurerm_machine_learning_compute_cluster` - the `identity` property can
now be updated
([#26404](hashicorp/terraform-provider-azurerm#26404
`azurerm_web_application_firewall_policy` - support for the
`JSChallenge` value for
`managed_rules.managed_rule_set.rule_group_override.rule_action`
([#26561](https://github.com/hashicorp/terraform-provider-azurerm/issues/26561))&#xA;&#xA;BUG
FIXES:&#xA;&#xA;* Data Source: `azurerm_communication_service` -
`primary_connection_string`, `primary_key`,
`secondary_connection_string` and `secondary_key` are marked as
Sensitive
([#26560](hashicorp/terraform-provider-azurerm#26560
`azurerm_app_configuration_feature` - fix issue when updating the
resource without an existing `targeting_filter`
([#26506](hashicorp/terraform-provider-azurerm#26506
`azurerm_backup_policy_vm` - split create and update function to fix
lifecycle - ignore
([#26591](hashicorp/terraform-provider-azurerm#26591
`azurerm_backup_protected_vm` - split create and update function to fix
lifecycle - ignore
([#26583](hashicorp/terraform-provider-azurerm#26583
`azurerm_communication_service` - the `primary_connection_string`,
`primary_key`, `secondary_connection_string`, and `secondary_key`
properties are now sensitive
([#26560](hashicorp/terraform-provider-azurerm#26560
`azurerm_mysql_flexible_server_configuration` - add locks to prevent
conflicts when deleting the resource
([#26289](hashicorp/terraform-provider-azurerm#26289
`azurerm_nginx_deployment` - changing the `frontend_public.ip_address`,
`frontend_private.ip_address`, `frontend_private.allocation_method`, and
`frontend_private.subnet_id` now creates a new resource
([#26298](hashicorp/terraform-provider-azurerm#26298
`azurerm_palo_alto_local_rulestack_rule` - correctl read the `protocol`
property on read when the `protocol_ports` property is configured
([#26510](hashicorp/terraform-provider-azurerm#26510
`azurerm_servicebus_namespace` - parse the identity returned by the API
insensitively before setting into state
([#26540](https://github.com/hashicorp/terraform-provider-azurerm/issues/26540))&#xA;&#xA;DEPRECATIONS:&#xA;&#xA;*
`azurerm_servicebus_queue` - `enable_batched_operations`,
`enable_express` and `enable_partitioning` are superseded by
`batched_operations_enabled`, `express_enabled` and
`partitioning_enabled`
([#26479](hashicorp/terraform-provider-azurerm#26479
`azurerm_servicebus_subscription` - `enable_batched_operations` has been
superseded by `batched_operations_enabled`
([#26479](hashicorp/terraform-provider-azurerm#26479
`azurerm_servicebus_topic` - `enable_batched_operations`,
`enable_express` and `enable_partitioning` are superseded by
`batched_operations_enabled`, `express_enabled` and
`partitioning_enabled`
([#26479](https://github.com/hashicorp/terraform-provider-azurerm/issues/26479))&#xA;&#xA;&#xA;</pre>
            </details>
        </details>
<a
href="https://infra.ci.jenkins.io/job/updatecli/job/azure/job/main/319/">Jenkins
pipeline link</a>
    </action>
</Actions>

---

<table>
  <tr>
    <td width="77">
<img src="https://www.updatecli.io/images/updatecli.png" alt="Updatecli
logo" width="50" height="50">
    </td>
    <td>
      <p>
Created automatically by <a
href="https://www.updatecli.io/">Updatecli</a>
      </p>
      <details><summary>Options:</summary>
        <br />
<p>Most of Updatecli configuration is done via <a
href="https://www.updatecli.io/docs/prologue/quick-start/">its
manifest(s)</a>.</p>
        <ul>
<li>If you close this pull request, Updatecli will automatically reopen
it, the next time it runs.</li>
<li>If you close this pull request and delete the base branch, Updatecli
will automatically recreate it, erasing all previous commits made.</li>
        </ul>
        <p>
Feel free to report any issues at <a
href="https://github.com/updatecli/updatecli/issues">github.com/updatecli/updatecli</a>.<br
/>
If you find this tool useful, do not hesitate to star <a
href="https://github.com/updatecli/updatecli/stargazers">our GitHub
repository</a> as a sign of appreciation, and/or to tell us directly on
our <a
href="https://matrix.to/#/#Updatecli_community:gitter.im">chat</a>!
        </p>
      </details>
    </td>
  </tr>
</table>

Co-authored-by: Jenkins Infra Bot (updatecli) <[email protected]>
Copy link

github-actions bot commented Aug 8, 2024

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 8, 2024
@xuzhang3 xuzhang3 deleted the f/mlw_comupte_cluster_identity branch August 14, 2024 02:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants