Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurerm_nginx_deployment - Omit capacity when creating basic plans #26223

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

ryepup
Copy link
Contributor

@ryepup ryepup commented Jun 4, 2024

The new Basic SKU does not support any scaling, and it is an error to provide Capacity or AutoScaleSettings. The resource is implemented in such a way that it was not possible to call
DeploymentsCreateOrUpdateThenPoll without one of these in the ScalingProperties, which is illegal for the Basic SKUs.

Currently the backend has a hack to silently drop ScalingProperties for Basic SKUs, but this is a confusing user experience:

  • initial terraform apply submits a PUT with "scalingProperties": {"capacity": 20} - this validation error is currently suppressed and the server reports back "scalingProperties": null

  • next terraform plan detects a capacity 0 -> 20

  • applying that plan requests a PATCH with "scalingProperties": {"capacity": 20}, that fails with a UnsupportedOnBasicPlan error

It gets more misleading if users specify a capacity explicity; capacity = 500 or auto_scale_profile will apply successfully, but get silently ignored server-side.

This patch checks for basic plans and omits the scalingProperties when the user hasn't changed from the default value. We already instruct users to ignore capacity, but the check is needed to prevent the implicit default value of 20 from getting into the DeploymentsCreateOrUpdateThenPoll request.

Community Note

  • Please vote on this PR by adding a 👍 reaction to the original PR to help the community and maintainers prioritize for review
  • Please do not leave "+1" or "me too" comments, they generate extra noise for PR followers and do not help prioritize for review

Description

PR Checklist

  • I have followed the guidelines in our Contributing Documentation.
  • I have checked to ensure there aren't other open Pull Requests for the same update/change.
  • I have checked if my changes close any open issues. If so please include appropriate closing keywords below.
  • I have updated/added Documentation as required written in a helpful and kind way to assist users that may be unfamiliar with the resource / data source.
  • I have used a meaningful PR title to help maintainers and other users understand this change and help prevent duplicate work.
    For example: “resource_name_here - description of change e.g. adding property new_property_name_here

Changes to existing Resource / Data Source

  • I have added an explanation of what my changes do and why I'd like you to include them (This may be covered by linking to an issue above, but may benefit from additional explanation).
  • I have written new tests for my resource or datasource changes & updated any relevent documentation.
  • I have successfully run tests with my changes locally. If not, please provide details on testing challenges that prevented you running the tests.
  • (For changes that include a state migration only). I have manually tested the migration path between relevant versions of the provider.

Testing

  • My submission includes Test coverage as described in the Contribution Guide and the tests pass. (if this is not possible for any reason, please include details of why you did or could not add test coverage)

Change Log

Below please provide what should go into the changelog (if anything) conforming to the Changelog Format documented here.

This is a (please select all that apply):

  • Bug Fix
  • New Feature (ie adding a service, resource, or data source)
  • Enhancement
  • Breaking Change

Related Issue(s)

Note

If this PR changes meaningfully during the course of review please update the title and description as required.

@agazeley
Copy link
Contributor

agazeley commented Jun 4, 2024

LGTM! 👍

@ryepup
Copy link
Contributor Author

ryepup commented Jun 4, 2024

@ryepup
Copy link
Contributor Author

ryepup commented Jun 26, 2024

@stephybun I see you've look at other PRs in this area, could I please get some help with a code review?

Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @ryepup, i've left some comments inline

internal/services/nginx/nginx_deployment_resource.go Outdated Show resolved Hide resolved
@@ -430,7 +436,10 @@ func (m DeploymentResource) Create() sdk.ResourceFunc {
prop.NetworkProfile.NetworkInterfaceConfiguration.SubnetId = pointer.FromString(model.NetworkInterface[0].SubnetId)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a check that returns an error if the plan is basic and the user attempts to set these properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but I don't think so. There's server side validation to fail fast with 400 Bad Request and a useful error message if the user attempts something like plan = "basic" and capacity = 30.

There are many server-side validations for the nginxdeployment.NginxDeployment{} data, and I didn't want to start duplicating that logic in the terraform provider. NGINXaaS evolves fairly rapidly, and I wanted to avoid bad UX coming from inconsistent validation logic if azurerm_nginx_deployment follows different rules than the azure CLI or portal.

Does that sound reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katbyte please let me know what you think, I'd love to get this merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katbyte thoughts? I can add a check if you feel strongly about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this PR is due to the server side validation code not handling this in a good way thou eh?

I would rather us check for bad data and return an error to the user over us silently not sending the data to the API that triggers a bad data response (which lacks a clear action for the user to take i presume) Not to mention if the user sets the value to 20 it will just silently not set it and also not trigger and server side validation.

so could we please change this to returning an error with basic and a capacity above zero rather then just omitting it? you might also have to change this to check if the property is set woth d.GetOkExists()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this PR is due to the server side validation code not handling this in a good way thou eh?

Not quite. This PR is due to azurerm_nginx_deployment applying a default capacity of 20, with no way for the user to specify "no capacity". The new "Basic" SKU does not support scaling of any kind, but azurerm_nginx_deployment was always sending a capacity.

I would rather us check for bad data and return an error to the user over us silently not sending the data to the API that triggers a bad data response (which lacks a clear action for the user to take i presume) Not to mention if the user sets the value to 20 it will just silently not set it and also not trigger and server side validation.

Agreed. The problem is there's no mechanism for a user to provide good capacity data for a basic SKU

resource "azurerm_nginx_deployment" "example" {
  sku  = "basic"
  # implicity gets capacity = 20, which is invalid
}

This weird conditional is an attempt to get TF to stop sending a capacity when the user doesn't ask for one. We currently have our server-side validation hacked to ignore capacity when for basic SKUs, but we want to remove that and return 400s with a good error message, like we do for other invalid combinations.

so could we please change this to returning an error with basic and a capacity above zero rather then just omitting it? you might also have to change this to check if the property is set woth d.GetOkExists()

Unfortunately due to the Default value, the capacity is always above zero regardless of what the user does.

I tried GetOkExists, but it still returns true when I omit capacity. Per the doc string, GetOkExists doesn't seem to work when there's a default value.

I didn't love the solution in this PR, it seemed like the least bad option that remained backward compatible. We considered a few alternatives:

  1. remove the Default: 20 - this would be a backwards incompatible change for TF users, and we didn't want to be that disruptive nor wait for the 4.0 release
  2. change the Capacity ValidationFunc to validation.IntAtLeast(0) and add docs saying you have to use capacity = 0 for basic SKUs - this felt like an extra step for users, and we were worried about allowing zeros into peoples' TF state setting up more backward compat problems in the future

Based on you're feedback, it sounds like the IntAtLeast(0) approach is preferable. I can update this PR to match, do you see any other options that might be better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its worth noting the 4.0 release will be within the next month or so so that could be changed, it might make sense to just have it be computed without a default as i presume the API will set a default of 20? and then GetOkExists could work.

outside that as it is not supported with basic the check could simply be if model.Capacity > 0 && ! strings.HasPrefix(model.Sku, "basic") { ? with a error thrown if capacity is not 20. its not ideal as a user could specify a capacity and not realize its not being sent but that could be resolved later on as i think with framework we will have the ability to know if its not set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.0 release will be within the next month or so

That's good to know, but I'd like to get something working for the 3.x users who might take awhile to upgrade.

the check could simply be... with a error thrown

Sounds good, I've updated this PR accordingly, and testing locally works as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ryepup. Following the discussion thread in the PR it isn't clear to me whether this is still missing some changes for 4.0? i.e. removing the default value for capacity using the features.FourPointOhBeta flag?

internal/services/nginx/nginx_deployment_resource.go Outdated Show resolved Hide resolved
The new [Basic SKU][0] does not support any scaling, and it is an error to
provide `Capacity` or `AutoScaleSettings`. The resource is implemented
in such a way that it was not possible to call
`DeploymentsCreateOrUpdateThenPoll` without one of these in the
`ScalingProperties`, which is illegal for the Basic SKUs.

Currently the backend has a hack to silently drop `ScalingProperties`
for Basic SKUs, but this is a confusing user experience:

- initial `terraform apply` submits a `PUT` with `"scalingProperties":
{"capacity": 20}` - this validation error is currently suppressed and
the server reports back `"scalingProperties": null`

- next `terraform plan` detects a `capacity 0 -> 20`

- applying that plan requests a `PATCH` with `"scalingProperties":
{"capacity": 20}`, that fails with a `UnsupportedOnBasicPlan` error

It gets more misleading if users specify a capacity explicity;
`capacity = 500` or `auto_scale_profile` will apply successfully, but
get silently ignored server-side.

This patch checks for basic plans and omits the `scalingProperties`
when the user hasn't changed from the default value. We already
instruct users to ignore `capacity`, but the check is needed to
prevent the implicit default value of 20 from getting into the
`DeploymentsCreateOrUpdateThenPoll` request.

[0]: https://docs.nginx.com/nginxaas/azure/billing/overview/#basic-plan
@ryepup
Copy link
Contributor Author

ryepup commented Aug 12, 2024

Thanks @ryepup. Following the discussion thread in the PR it isn't clear to me whether this is still missing some changes for 4.0? i.e. removing the default value for capacity using the features.FourPointOhBeta flag?

@stephybun Changes for 4.0 will come in a different PR, this is just to get a fix in for the 3.x versions that our clients are currently using.

@stephybun stephybun added the bug label Aug 13, 2024
Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ryepup LGTM 👍

Just as a heads up 4.0 is targeted for next week, so any PRs with breaking changes should be raised soon to make it in.

@stephybun stephybun merged commit c4e3598 into hashicorp:main Aug 13, 2024
35 checks passed
@github-actions github-actions bot added this to the v3.116.0 milestone Aug 13, 2024
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants