Data API V2: Custom differ #3315

stephybun · 2023-11-09T09:37:00Z

The auto generated PRs that regenerate the Data API definitions often contain many changes that are time consuming to review and validate. We need an automated way to summarise the changes made and presented in a user friendly manner to expedite the review process of these changes and to reduce the chance of errors or problems being overlooked and creeping their way into the provider.

Common scenarios that need to be covered by this differ are:

Extracting a unique list of resource IDs that have been changed/added so that the segments can be checked for the correct casing (this functionality can be taken from the existing extract-tf-resource-ids job)
When definitions are removed in a service's API version, automatically detecting whether the provider imports and uses this version and any of these endpoints that have been modified/removed and to flag breaking changes

The text was updated successfully, but these errors were encountered:

tombuildsstuff · 2023-11-10T10:32:42Z

Calling out a design limitation from the current setup / a topic that we've previously talked about here too:

The V1 tool (extract-tf-resource-ids) currently has a hard-dependency on the GitHub API which means this tool can't be used locally.

In addition the GitHub API uses pagination when the changeset associated with a pull request contains over 3000 files changes, which the V1 tool isn't accounting for (meaning that we're not outputting all the changes today, example) - and whilst we can update the tool to handle pagination, switching to a local setup removes this limitation.

As such rather than relying on the contents of local files, presumably it'd be better to query the Data API to get the data from both main and the current branch to allow us to perform the diff, which presumably would be easiest by having the tool doing something along these lines:

Clone a copy of the main branch from the current working directory (e.g. this repository - intentionally not main from GitHub) into a temporary directory. (e.g. git clone -b main /path/to/working/directory /temp/directory, but presumably in code which looks to be possible via the options).
Spinning up an instance of the Data API from that branch (to give us a baseline) - specifying the port the Data API should be using as an Environment Variable.
Having the tool retrieve the changes from the main version of the Data API.
Shutting that version of the Data API down.
Launching the Data API from the current working directory - specifying the port the Data API should be using as an Environment Variable.
Shutting that version of the Data API down.
Performing the diff - optionally writing that to a file if specified.

This approach would mean that we're both able to run that locally - but also would allow us to output more of the semantic type of changes in the future too (e.g. highlighting when a new Discriminator implementation gets added, or a flag when the value for a constant changes [e.g. casing]) - in addition to what we're doing today (around the Resource ID segments).

In addition it means that when this is run in Github and an output file for the diff is specified - that we can have that posted as a comment on the pull request when run in automation (referencing how hashicorp/go-azure-sdk does this) - meaning the tool doesn't need to interact with GitHub at all.

In order to do that, I suspect we'd want to make a couple of changes to the Data API:

Adding support for launching the Data API on a given port (making life easier when running in automation/avoiding conflicts when running the Data API locally) - the automation already has an Environment Variable defined for this fwiw. (Tracked in Data API V2: Changes needed to enable Automation #3323)
Adding an optional flag for --data-directory, to allow overriding the path to the ./api-definitions directory. This means we could launch the same compiled version of the Data API once, rather than compiling it twice, in the above steps - which feels beneficial? (Tracked in Data API V2: Changes needed to enable Automation #3323)

WDYT?

stephybun added enhancement New feature or request tool/data-api Issues with the Data API labels Nov 9, 2023

This was referenced Nov 10, 2023

Data API V2: Changes needed to enable Automation #3323

Closed

tools/extract-tf-resource-ids: support for large changesets #3340

Merged

stephybun assigned tombuildsstuff Nov 27, 2023

tombuildsstuff mentioned this issue Dec 7, 2023

New Tool: Data API Differ #3462

Merged

tombuildsstuff closed this as completed in #3462 Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data API V2: Custom differ #3315

Data API V2: Custom differ #3315

stephybun commented Nov 9, 2023

tombuildsstuff commented Nov 10, 2023 •

edited

Loading

Data API V2: Custom differ #3315

Data API V2: Custom differ #3315

Comments

stephybun commented Nov 9, 2023

tombuildsstuff commented Nov 10, 2023 • edited Loading

tombuildsstuff commented Nov 10, 2023 •

edited

Loading