Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to API v4 (GraphQL) #83

Closed
docwhat opened this issue Mar 28, 2018 · 36 comments
Closed

Switch to API v4 (GraphQL) #83

docwhat opened this issue Mar 28, 2018 · 36 comments
Labels
Provider Status: Up for grabs Issues that are ready to be worked on by anyone Type: Feature New feature or request

Comments

@docwhat
Copy link
Contributor

docwhat commented Mar 28, 2018

The GitHub v4 (GraphQL) api is now official.

Since we're doing large queries across lots of objects in GitHub, using GraphQL will be significantly faster and use up less of the API Rate Limit.

Both reading and writing (mutating) are supported.

@vancluever vancluever added the Type: Feature New feature or request label Mar 29, 2018
@joebowbeer
Copy link

@captn3m0
Copy link

We hit our rate limits daily (150 Repos, 30 teams, and ~200 users in the org) and a lot of it is just because of plans fetching repos/teams/team memberships one by one.

Switching to graphQL will help alleviate this significantly.

@deppy
Copy link

deppy commented Jul 30, 2019

They're adding additional functionality into the graphQL/v4 api that isn't being added to v3. (traige and maintain permissions for example)

@joebowbeer looks like repo creation has been added, didn't see a get-contents equivalent though.

@captn3m0
Copy link

A simple GraphQL API based cache layer that is used while refreshing any resource will speed up operations by a lot. (One call to fetch a list of all repositories and then cache this for when a repository information is fetched.

That way it could use both?

@patrickmarabeas
Copy link
Contributor

I've done a fair amount of work on a GraphQL version of the provider.

The two major foundational pieces look to be:

  • Running REST and GraphQL clients side-by-side (including updating resources)
  • Updating all resources to use their Global Object ID (including writing state upgraders)

This would then allow resources to be uplifted as required or when possible - the GraphQL API is still pretty barebones.

I'll start to look at merging the work I've done back in.

@patrickmarabeas
Copy link
Contributor

@julianvmodesto
Copy link

Would it be possible to select the API version for a resource, if v3 and v4 are supported?

The v3 API has conditional requests, which allows for read requests to be cached, so it would be good to keep resources at v3 (at least the reads).

@kpfleming
Copy link
Contributor

It is certainly possible; for example, in the OpenStack provider many resources have _v1, _v2, and _v3 flavors in order to accommodate users with different needs. This provider could provide _v4 flavors of the resources for users who want to upgrade.

@patrickmarabeas
Copy link
Contributor

What would the "upgrade" / "downgrade" path be between the different resources? State migration?

Won't this be fairly painful with differences in IDs between the resources? At least when it's the same resource you can include an upgrade function which manages this stuff...

@patrickmarabeas
Copy link
Contributor

I would like to keep momentum on this work - who needs to be involved in deciding how this should proceed?

@patrickmarabeas
Copy link
Contributor

I've done some preliminary tests on handling the two different APIs using two separate providers. It seems to be the better choice.

This way the resources themselves remain consistent and should allow for upgrade functions to work if we start v4 schema versions from say, v100, so there's no possible clash of a resource version between the two providers. Every version can have an update function - hopefully in both directions as well.

provider "github" {
  organization = "myorg"
  ...
}

provider "github-v4" {
  organization = "myorg"
  ...
}

resource "github_branch_protection" "master" {
  provider = github-v4 # defaults to github
  ...
}

How does this sit with everyone?

@patrickmarabeas
Copy link
Contributor

PR to help interface v3 and v4 data sources and resources: https://github.com/terraform-providers/terraform-provider-github/pull/383

@docwhat
Copy link
Contributor Author

docwhat commented Mar 10, 2020

As long as resources all use GitHub's IDs to track objects, then switching between v3 and v4 should be easy. e.g. use repositoryId instead of the org/repo.

@patrickmarabeas
Copy link
Contributor

patrickmarabeas commented Mar 11, 2020

It would need to be the node_id, from which we can derive the id from.

Relevant issue / PR: https://github.com/terraform-providers/terraform-provider-github/issues/55 & https://github.com/terraform-providers/terraform-provider-github/pull/65

Additionally, not all resources have IDs in v3. Branch protection is one such example. State ID should not be <repo_node_id>/<branch_name> as v4 uses pattern instead of branch, and a change in pattern should not recreate the resource.

@patrickmarabeas
Copy link
Contributor

What's the process for creating a new official provider, if this is the route we are taking? This is the v4 provider I've pulled together thus far: https://github.com/patrickmarabeas/terraform-provider-github-v4

@jcudit
Copy link
Contributor

jcudit commented Mar 12, 2020

👋 @patrickmarabeas - I am unaware of a process to migrate from one official provider to another but can check with other provider owners if they've had to solve breaking changes in that manner.

I have seen discussion around cutting a new major version of this provider over in https://github.com/terraform-providers/terraform-provider-github/issues/354, which would align with the work you've been driving on the GraphQL front. Would a 3.x release help with upstreaming your work into this repository? Or does a new provider altogether provide a better migration path overall?

@patrickmarabeas
Copy link
Contributor

So it looks like there is a desire to pick the API being used. It also looks like v3 and v4 APIs are being developed in parallel - so depending on the features you require....

Google has google and google-beta, I assume resource versions would flow from google-beta to google - while we are really looking at running two providers side by side, and if features are coming out on both APIs, this will mean schema versions increment on both. Overlapping of these versions would probably throw a curve ball at any upgrade functions that were implemented.

Alignment of either the IDs between the two providers would go a long way to provide easy swapping without needing upgrade/downgrade functions to alter the state ID - BUT unfortunately v3 API seems to be lacking in IDs. I'm not sure how to navigate that hurdle unless it's just accepted that some resources will get recreated.

An alternate approach is to feed gitub-v4 into github only as resources hit maturity, but we'll still need alignment on the ID front.

Implementing v4 into the v3 provider to look up resource IDs where needed could also be a solution.

@jcudit
Copy link
Contributor

jcudit commented Mar 18, 2020

An alternate approach is to feed gitub-v4 into github only as resources hit maturity, but we'll still need alignment on the ID front.

My vote goes to this one ☝️ . Being selective about which resources receive the update is a path I have time to support. A quick look at the request logs from an acceptance test suite run shows where low hanging fruit may be:

$ cat test.log | grep -A2 REQUEST | grep HTTP/ | cut -d '/' -f 2 | sort | uniq -c
   4 meta HTTP
  81 organizations
 710 orgs
  33 projects
 684 repos
  18 search
  35 teams
  16 user
  25 users

Working backwards from the goal of not being rate limited, I support a v2.5 release that would ...

  • Add a v4 client
  • Add Optional v4 Client Adoption To github_repository Resources
  • Add Acceptance Test Coverage For ☝️ (cc @kmoe)
  • Add State Migration Coverage For ☝️

As for how we make adoption optional for the github_repository resource, I prefer a feature flag approach at the provider level:

provider "github" {
  version                   = "2.3.0"
  organization              = var.organization
  enabled_graphql_resources = ["github_repository"]
}

@patrickmarabeas
Copy link
Contributor

Happy to take the above approach @jcudit and start working on this if there's going to be some degree of traction...

Please get the following branches merged:

  • v4 client (so we can have GraphQL goodness)
  • node_id (v4 resources expect node_id's, not names of things )

Once these are in I'll look at adding:

@patrickmarabeas
Copy link
Contributor

@jcudit It isn't looking good. I don't see a way to dynamically set the schema resource. Best seems to be checking at the CRUD function level, but that ultimately means the schema has to be the same between APIs.

@jcudit
Copy link
Contributor

jcudit commented Apr 14, 2020

@patrickmarabeas understood. I propose we push to merge the changes made so far to unblock other efforts. I will follow up on the PR with next steps for closing it out. We can then open up discussion around a better design to achieve feature flagging.

@patrickmarabeas
Copy link
Contributor

Sounds good.

Do we really need to increase complexity and provide a long lasting option between APIs?

The simplest approach would be having a github-beta provider that houses up-and-coming resources using the new API. Once the API is feature ready / supersedes v3 it can be deprecated from github-beta and implemented in github with a schema version bump.

@jcudit
Copy link
Contributor

jcudit commented Apr 15, 2020

Ah, my inexperience with schema versioning is showing here. I now understand the github-beta approach better, but not fully. I am still unaware of how users would evaluate the new provider and the interoperability between state managed by the original provider and the beta one. This gap in understanding is what makes the approach not simple from my perspective.

I can take some time to read further on this front, but currently prefer to merge and investigate flagging in functionality due to the reasons above.

@patrickmarabeas
Copy link
Contributor

Another approach would be to release feature versions on branches while they are in "beta". If the user requires multiple features, they would need to include multiple versions of the provider.

Could you detail your thinking behind why v4 resources should be opt in? Is this just for beta testing, or is it a long lived option?

@jcudit
Copy link
Contributor

jcudit commented Apr 17, 2020

I see it as a beta testing workflow, similar to how the GO111MODULE=on environment variable provides the user with a choice of when to adopt recommended functionality.

My biggest fear with this change is breaking existing state due to incompatibilities with resource identifiers before and after GraphQL support. I see the adoption of GraphQL as a performance improvement to work around API request rate limiting. I think trading durability for performance is valued by some users but not all and I want to leave an option for the use cases where rate limiting is not a pain point.

That being said, there are other ways to resolve my main concern and the uplifting approach with state migration coverage is likely the correct strategy. I appreciate you pausing to explore available approaches.

@patrickmarabeas
Copy link
Contributor

Understood.

#305 also highlights continued feature discrepancy between v3 and v4.

I think it's going to have to be done via feature toggling, but perhaps not so black and white - some resources:

  • are on feature parity in both APIs
  • have no v4 footprint
  • would require intermixed API calls to achieve full feature set, BUT:
    • may have inputs which are incompatible with each other - e.g. branch protection (a v4 pattern input of release-* would be incompatible as a v3 branch name)

In the case of other resources, e.g. teams (#339 - I anticipate the resource would be similar to that of the data source linked), where the data set is/can be a lot richer via the v4 API - the schema would need to reflect that of the v4 return, culling a lot of the garbage data v3 returns, but also returning nil for many data points if only the v3 API is used.

Hopefully that brain dump makes some sense.

@jcudit
Copy link
Contributor

jcudit commented Apr 20, 2020

I've done some more reading and feel comfortable with the original approach where we uplift individual resources without feature toggling. The only change I can think to add to our original plan is an intermediate step to address the concern around resource identifiers changing and breaking a user's configuration.

At this point we could update all resource identifiers to use global node IDs. The original IDs, if different from the new global IDs, can be archived to a v3_id field. This allows users whose configurations depend on a resource's legacy ID value to be updated with a long-lived value while we align to GitHub's recommendation for the long-term.

What do you think? Would this extra step make further uplifts easier / safer? Or do you think we should continue with the original plan as-is and initially focus on the resources with feature parity?

@patrickmarabeas
Copy link
Contributor

My only concern is the feature differential between APIs - which might fluctuate over time between v3 and v4. What are your thoughts on resources using both APIs to achieve the maximum feature footprint when required?

I might draw up a table of id interactions between resources to spot any issues we might run into, and then spin up a new issue to track that work.

We can also use the branch protection uplift as a test bed for this - as it definitely appears to be the edge case in all this.

@jcudit
Copy link
Contributor

jcudit commented Apr 21, 2020

resources using both APIs to achieve the maximum feature footprint when required

💯 on this and your other suggestions. We should harness both, defaulting to v4 now that it is available.

Whatever ends up being written will likely be a model for new resources to follow. Keep that in mind as this is an opportunity to DRY things up over time.

@markediez
Copy link

Any news on this?

@spr-mweber3
Copy link

Any news on this? This is really hitting us hard. It's really overdue that the provider utilizes the features introduced several years ago. Are there any plans on moving to the GraphQL API?

@kfcampbell
Copy link
Member

@spr-mweber3 are there particular resources you'd like to see the GraphQL API used in? We're using it in a number of places so far.

@spr-mweber3
Copy link

@kfcampbell, yes sure. I'm particularly interested in having the resource github_repository_file implement that. Currently we're doing quite a lot with it and are hitting the rate limit quickly because every file is read/committed separately. It's just a waste of API calls the way this functionality is currently implemented in the provider.

@kfcampbell
Copy link
Member

Is that behavior you might be interested in submitting a PR to change?

@spr-mweber3
Copy link

@nickfloyd @kfcampbell Sorry for the late response, guys. Unfortunately neither time nor skills provide that I fix this in a PR. Not my area of expertise (yet), but I'm happy to see that it already got picked up with high priority. 💯

If I can be of any assistance in shaping the requirements of this feature or any other way, just let me know. I'd be really grateful if the github_repository_file resource would be a bit more thrifty in API calls. And I don't think I'm the only one.

@nickfloyd nickfloyd added the Status: Up for grabs Issues that are ready to be worked on by anyone label Nov 15, 2022
@nickfloyd nickfloyd moved this to 🔥 Backlog in 🧰 Octokit Active Dec 5, 2022
@kfcampbell
Copy link
Member

@spr-mweber3 can I talk you into creating an issue for the particular resource(s) you want migrated over to the GraphQL API? I'd like to close this issue as-is, since we're not going to be doing a single PR to cut over every resource the GraphQL API would support.

@github-project-automation github-project-automation bot moved this from 🔥 Backlog to ✅ Done in 🧰 Octokit Active Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Provider Status: Up for grabs Issues that are ready to be worked on by anyone Type: Feature New feature or request
Projects
None yet
Development

No branches or pull requests