Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data API V2: Custom differ #3315

Closed
stephybun opened this issue Nov 9, 2023 · 1 comment · Fixed by #3462
Closed

Data API V2: Custom differ #3315

stephybun opened this issue Nov 9, 2023 · 1 comment · Fixed by #3462
Assignees
Labels
enhancement New feature or request tool/data-api Issues with the Data API

Comments

@stephybun
Copy link
Member

The auto generated PRs that regenerate the Data API definitions often contain many changes that are time consuming to review and validate. We need an automated way to summarise the changes made and presented in a user friendly manner to expedite the review process of these changes and to reduce the chance of errors or problems being overlooked and creeping their way into the provider.

Common scenarios that need to be covered by this differ are:

  • Extracting a unique list of resource IDs that have been changed/added so that the segments can be checked for the correct casing (this functionality can be taken from the existing extract-tf-resource-ids job)
  • When definitions are removed in a service's API version, automatically detecting whether the provider imports and uses this version and any of these endpoints that have been modified/removed and to flag breaking changes
@stephybun stephybun added enhancement New feature or request tool/data-api Issues with the Data API labels Nov 9, 2023
@tombuildsstuff
Copy link
Contributor

tombuildsstuff commented Nov 10, 2023

Calling out a design limitation from the current setup / a topic that we've previously talked about here too:

The V1 tool (extract-tf-resource-ids) currently has a hard-dependency on the GitHub API which means this tool can't be used locally.

In addition the GitHub API uses pagination when the changeset associated with a pull request contains over 3000 files changes, which the V1 tool isn't accounting for (meaning that we're not outputting all the changes today, example) - and whilst we can update the tool to handle pagination, switching to a local setup removes this limitation.

As such rather than relying on the contents of local files, presumably it'd be better to query the Data API to get the data from both main and the current branch to allow us to perform the diff, which presumably would be easiest by having the tool doing something along these lines:

  1. Clone a copy of the main branch from the current working directory (e.g. this repository - intentionally not main from GitHub) into a temporary directory. (e.g. git clone -b main /path/to/working/directory /temp/directory, but presumably in code which looks to be possible via the options).
  2. Spinning up an instance of the Data API from that branch (to give us a baseline) - specifying the port the Data API should be using as an Environment Variable.
  3. Having the tool retrieve the changes from the main version of the Data API.
  4. Shutting that version of the Data API down.
  5. Launching the Data API from the current working directory - specifying the port the Data API should be using as an Environment Variable.
  6. Shutting that version of the Data API down.
  7. Performing the diff - optionally writing that to a file if specified.

This approach would mean that we're both able to run that locally - but also would allow us to output more of the semantic type of changes in the future too (e.g. highlighting when a new Discriminator implementation gets added, or a flag when the value for a constant changes [e.g. casing]) - in addition to what we're doing today (around the Resource ID segments).

In addition it means that when this is run in Github and an output file for the diff is specified - that we can have that posted as a comment on the pull request when run in automation (referencing how hashicorp/go-azure-sdk does this) - meaning the tool doesn't need to interact with GitHub at all.

In order to do that, I suspect we'd want to make a couple of changes to the Data API:

  1. Adding support for launching the Data API on a given port (making life easier when running in automation/avoiding conflicts when running the Data API locally) - the automation already has an Environment Variable defined for this fwiw. (Tracked in Data API V2: Changes needed to enable Automation #3323)
  2. Adding an optional flag for --data-directory, to allow overriding the path to the ./api-definitions directory. This means we could launch the same compiled version of the Data API once, rather than compiling it twice, in the above steps - which feels beneficial? (Tracked in Data API V2: Changes needed to enable Automation #3323)

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request tool/data-api Issues with the Data API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants