Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Tool: Data API Differ #3462

Merged
merged 20 commits into from
Dec 15, 2023
Merged

New Tool: Data API Differ #3462

merged 20 commits into from
Dec 15, 2023

Conversation

tombuildsstuff
Copy link
Contributor

This PR introduces a new tool data-api-differ which detects and then outputs the changes between two different sets of API Definitions.

The intention here is to provide a replacement for the older extract-resource-id-segments tool - but also to output a more granular diff of what’s happened so that anyone looking at a PR which updates the API Definitions has a clear understanding of what’s changing in that Pull Request.

This tool detects three types of changes:

  1. Breaking Changes - for example when a Service/API Version/API Resource/Constant/Field/Model/Operation/Resource ID gets removed, or when a Resource ID/Options object gets added to an Operation.
  2. Breaking and Non-Breaking Changes - detects and outputs both the Breaking and Non-Breaking Changes - for example where an API Version gets added, a new Constant Key/Value gets added or a Service gets removed.
  3. New Static Identifiers present within New Resource IDs - this determines any Static Identifiers (e.g. the name of the resource provider or the fixed value for a Static Resource ID Segment) and then outputs a unique, sorted list.

A shortened example of the output for each of these commands can be seen in the README - but the output from an earlier build can be found in this Gist for review.

This tool detects each Type of Change as unique struct - meaning that it’s possible to quickly add more granular reports as needed - and I suspect we’ll want to extend this to support diffing two different Versions of two APIs (e.g. 2020-01-01 and 2023-06-01) in the future.

This PR also adds the associated automation, running this tool against Pull Requests to be able to detect changes between main and a given Pull Request.

Fixes #3315

Notes / Limitations

  • When adding a new Service/API Version at this point in time the entire diff is output in one long block - it’s likely we can change how this is presented to make this easier to review.
  • At this point in time the tool is limited to Resource Manager until the Data API SDK is updated and consolidated - but that should be easy to extend in the future (once Data API V2 is fully rolled out). Presumably we’ll want to group the Changes against by Data Source (Microsoft Graph / Resource Manager) when that happens.
  • At this point the changes being output assume an understanding of the values (e.g. Model has a new Parent Type) has no backing documentation. We should look to add documentation to ./docs which explains what each of the types of changes are, and the consequence of this/if these are resolved.
  • At this point time this isn’t checking if the Services are used within hashicorp/terraform-provider-azurerm (or hashicorp/terraform-provider-azuread) but that seems like a logical extension to be able to determine when a breaking change is fine (e.g. removing an unused old Service/API Version) and when it’s problematic (e.g. removing a Service/API Version which is currently vendored into the Provider[s]).
  • At this point in time this tool doesn’t detect changes to the Terraform Resources, that’s left as a logical extension point for the future.
  • Arguably the tool could do with an updated name, @stephybun suggested data-api-reporter - however given it could be worth extending this in the future (e.g. so that it integrates with the upgrade-go-azure-sdk tool) - I suspect it’s probably worth leaving this as differ for the moment?

This commit introduces one struct per Type of Change - allowing us to pick out later
on which Changes are relevant for which Command.
…ls for now

This is necessary until the Data API SDK is updated - but can be reconciled later
This'll need to be configured both in any Tests and the Main function, but allows
this to be switched out as needed
This allows detecting the changes between two sets of Resource Manager Services, but
in the future can be extended to also support Microsoft Graph when the SDK exposes it.

This allows detecting of changes in:

* Services
* API Versions
* API Resources
* Constants
* Models
* Operations (and Operation Options)
* Resource IDs (and Resource ID Segments)

This currently _doesn't_ support tracking changes to Terraform Resources, but that seems
like a good logical extension point for the future.
…list of Static Identifiers

This allows us to compute the list of Resource ID Segments which are new/updated so we can output a static list
This makes use of a "View" concept similar to Terraform Core, the intention here being to allow us the ability to output this as JSON
in the future for interoperability with other tooling (such as the `upgrade-go-azure-sdk` tool in `hashicorp/go-azure-sdk`).

The Changes and Breaking Changes UIs require further work to clean these up - this is very much an MVP showing all of the available data
but in the future I suspect we'll change this to be grouped by Data Type (e.g. Resource Manager/MS Graph) and Service and have the nested
information expandable, or something. But this is fine for an MVP.

Noteably this will output a LOT of data in the short-term - showing ALL changes that have happened to a Service/API Version/API Resource/etc
when it's added - but shows only the top-level removal (e.g. the Removal of the Service, and not each API version) when the Service and all
nested items are removed.
These might seem excessive, but since we're relying on this output this is worth testing.
This helps determine when the Data API has launched, and on which port.
In the event that `data-api-binary-path` is unspecified, we now look for this on the PATH which is helpful when running in automation.
This allows outputting the rendered output to a file, which makes it possible to use the result in automation.
…n output the result on Pull Requests

This triggers on changes to the API Definitions (to handle these changing), changes to the Data API (so we can see the result) and
on changes to `importer-rest-api-specs`. This currently doesn't run for changes to `importer-msgraph-metadata` but can be added when
the Data API SDK is updated in the near future.
This is used by the Data API Differ automation
This means that we have access to the `main` branch to be able to clone it
Copy link
Member

@mbfrahry mbfrahry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nothing is jumping out as a negative other than potential noise-iness around information we don't care about so it'll be fun to see what information we get out of this first pass and how we can tweak it

@tombuildsstuff
Copy link
Contributor Author

Yeah I'm anticipating that we'll change the output UI after having learned more about what we're after - the information being output is useful but a little unwieldily at the moment - but we can change that more easily once we've got some more examples.

@tombuildsstuff tombuildsstuff merged commit 5b5165d into main Dec 15, 2023
9 checks passed
@tombuildsstuff tombuildsstuff deleted the f/data-api-differ branch December 15, 2023 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data API V2: Custom differ
2 participants