Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Preflight Validate API support #4329

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Conversation

hemarina
Copy link
Contributor

@hemarina hemarina commented Sep 17, 2024

Adds support for Preflight Validate API.

TODO:

  • Tests

Questions

What is Preflight?

Preflight API validates whether the specified template is syntactically correct and will be accepted by Azure Resource Manager. More information on preflight wiki.

Why do we implement Validate API?

There’re two preflight API: Validate API and Deployment API. We implemented deployment API. However, deployment API creates a scenario where specific resource fails at preflight API and stops its deployment where some resources are already created. For example, resources are created at an unsupported location. This will cause more dev-test loop for cleaning resources and re-deploy. Adding Validate API support before provisioning to azure solves this situation.

What are Preflight Validate API limitations?

  1. Around 60% applicable resource types are enabled with preflight. Check preflight coverage dashboard.
  2. Preflight is not able to validate nested resources with parameters referencing runtime functions.

Example Test Error Message

Standard deployments

image

Deployment stack

image

hemarina and others added 6 commits August 14, 2024 16:04
To assign the process to the job object after we create it, we need
the `windows.Handle` that backs the process we started with `os/exec`.
This was not exported and so we used some gnarly casting to cast the
`os.Process` to a structure we authored that had the same layout of
`os.Process` so we could grab the handle property.

This is unsafe if the layout of the `os.Process` changes, which it did
in Go 1.23.

Move to code that doesn't depend on the internals of `os.Process` by
using the exported Pid to call `windows.OpenProcess` to get a handle and
then pass that along to `windows.AssignProcessToJobObject`.
@hemarina hemarina self-assigned this Sep 17, 2024
@hemarina hemarina marked this pull request as ready for review September 20, 2024 21:57
Copy link
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation for templates with nested deployments? For example, I used our Todo AKS template, specified an invalid K8s version parameter and expected that this would fail at our own preflight check call.

Instead the preflight from azd passes, but then we get a preflight failure during ARM deployment.

Preflight validation check for resource(s) for container service aks-oc3hr2l7nqlee in resource group rg-wabrez-todo-aks-swed failed. Message: Web application routing requires at least Kubernetes 1.22. Details: 

 (Code: WebAppRoutingUnsupportedK8SVersion)

Failing Deployment Example

If it is not going to catch preflight errors on nested deployments I question whether it is valuable to add at this point.

@hemarina hemarina requested review from wbreza and weikanglim November 1, 2024 21:27
@hemarina
Copy link
Contributor Author

@weikanglim @wbreza @vhvb1989 @ellismg Could you take another pass of this PR?

@hemarina
Copy link
Contributor Author

hemarina commented Nov 20, 2024

What is the expectation for templates with nested deployments? For example, I used our Todo AKS template, specified an invalid K8s version parameter and expected that this would fail at our own preflight check call.

Instead the preflight from azd passes, but then we get a preflight failure during ARM deployment.

Preflight validation check for resource(s) for container service aks-oc3hr2l7nqlee in resource group rg-wabrez-todo-aks-swed failed. Message: Web application routing requires at least Kubernetes 1.22. Details: 

 (Code: WebAppRoutingUnsupportedK8SVersion)

Failing Deployment Example

If it is not going to catch preflight errors on nested deployments I question whether it is valuable to add at this point.

We have discussed about this matter during standup. This is because of nested deployments with parameters referencing runtime functions are not supported by preflight API. The current changes will help users with their custom templates if they're not using nested resources. Jonny from preflight validate API team is working on the fixes of skipped validation on nested deployment, and he expects to release the fix "sometime within this semester".

@hemarina
Copy link
Contributor Author

hemarina commented Nov 20, 2024

Rechecking about the templates with current use case, we might want to wait to merge the PR after preflight team's fix. There're templates like Azure-Samples/cosmos-db-nosql-go-quickstart that directly using main.bicep but the bicep file is still using module which will cause a validation skip on preflight api. Then, the current use case for this PR will be for users who only using single bicep file without nested resources and this is very rare case.

@hemarina hemarina added the hold label Nov 21, 2024
@hemarina
Copy link
Contributor Author

hemarina commented Nov 21, 2024

We had discussion about future action of this PR between hold the preflight PR, merge the preflight PR but not refer it until fix is released from preflight team or add it in alpha feature. And we decided to hold the preflight PR and merge it after preflight team released their fix on nested resources.

Summary of why we hold the PR:
Current issue: preflight API skip validation for nested resources and the release of fix from preflight team is planned before March ("sometime in this semester").
Pros: Allows users to run validation check if they are not using nested resources
Cons: For most cases, users won't get benefits because most deployments usually have nested resources. The API adds average 10 seconds of provision process.
Result: Merging the PR or adding it as alpha feature won't benefit users much since users usually use nested resources and it will cause a validation skip on preflight.

cc @rajeshkamal5050

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Spike] Preflight Checks API integration
4 participants