Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server support for multiple API versions #869

Closed
davepacheco opened this issue Jan 5, 2024 · 1 comment · Fixed by #1115
Closed

server support for multiple API versions #869

davepacheco opened this issue Jan 5, 2024 · 1 comment · Fixed by #1115

Comments

@davepacheco
Copy link
Collaborator

@ahl already started this in #862. Here are some more detailed notes on the plan we discussed a few weeks ago.

Motivation

We use Dropshot in Omicron to provide OpenAPI/HTTP APIs between many components. We want to be able to evolve these APIs in both compatible and incompatible ways. We also want to do rolling upgrades of these components without downtime. With JSON of course it's fairly easy to phrase most API changes backwards-compatibly: have the implementation ignore any unknown fields, make any new fields optional, don't change the semantics of existing fields, etc. These approaches essentially make any new API a superset of the previous one. One can evolve APIs with Dropshot this way today. This approach has two big disadvantages: (1) it can be hard to know that you've done this correctly -- that you haven't accidentally made some change that will cause requests from older clients to fail. (2) you lose a lot of the advantages we enjoy today around strong typing on the server side, the spec, and the client side. In the limit after a lot of changes we might expect most of the API to wind up being a bunch of Options of complex enum types for all variants that previous clients might have sent.

Instead, we've proposed this basic approach:

  • each version of an API has an explicit version number or string (this is true already)
  • clients always specify which API version they want to use
  • servers support multiple API versions and pick which one to use based on what the client asks for (that's what this issue is all about)
  • when we build the image used to deploy any component (whether a client or server of a dropshot API) we include metadata about what APIs the component provides and consumes
  • the deployment system ensures that when deploying anything, there already exist components that provide whatever APIs (and versions) the component depends on

In this way, you can roll out a breaking change to an API by:

  • starting with server and clients at the old version
  • rolling out a version of the server that supports both the old and new version
  • rolling out a version of the clients that use the new version
  • rolling out a version of the server that supports only the new version

This issue is all about the "servers support multiple API versions" item above. While on some level this can be achieved very easily (essentially copy/paste all the endpoint registrations multiple times), we want an approach following our usual guidelines: it should be easy to do the right thing and hard to accidentally do the wrong thing. If you want to make a change to an API, breaking or not, it shouldn't require duplicating the whole server and it should be very hard to accidentally break an earlier version of the API.

For those with access, there's more on this in Oxide RFD 421.

Proposal

  • Each endpoint can be tagged with a semver range. See WIP: dropshot support for multiple API versions #862 for an example.
  • ApiDescription becomes MultiVersionApiDescription (logically, if not literally).
    • As endpoints are registered, they're organized in the router data structure by version.
    • When a request comes in, the server will use the client-requested version (if provided) when routing the request to a handler.
    • The ApiDescription will be able to spit out an API spec for any semver. It does the same thing it does today, just using the version information to determine if a given endpoint (and its types) should be included in the spec.

Today in consumers like Omicron today, we check the current API spec into the repo and use an expectorate test to avoid accidentally changing it (i.e., to give us a chance to review any potentially breaking changes). We've largely been okay with breaking most of our APIs so far, so we've just allowed this file to evolve as needed.

The proposal for consumers would be:

  • Instead of checking in just one spec, there'd be a directory of supported spec versions. For each one, we'd have Dropshot re-generate the spec for that version (from the current code) and verify that it matches the checked-in spec. It'd be nice to just say that we'd never expect one of these older spec files to change, but in practice, we might need to allow changes here but limit it to provably-compatible changes (e.g., doc changes).
  • When you want to make a breaking change to an API, you:
    • Copy the latest file to a new version.
    • Make whatever changes you want to the dropshot server. If you're adding a new endpoint, mark it added in your new version. If you're removing an old endpoint, you just mark it removed in your new version. If you want to change an input type, you could do this by "removing" the old endpoint that used that type and "adding" one that uses the new type.
    • Run the tests. The old specs should not change. The new spec should differ only in the ways you expect.
    • Note that you can do this even if the most recent version was never "released" to customers and we don't care about officially supporting it. This way, development is no different from release, and the whole mechanism works (and is exercised) in dev/test, too.
  • You can remove support for an old version at any time by just removing the old spec file and any endpoints that are no longer used by any version.
    • You can remove support for the oldest version once you know that clients are all upgraded past it in all deployments.
    • You can also choose to remove support for intermediate versions if you know they've never been released to supported systems.

This is a very general approach. It should be possible to implement any sort of breaking change this way. We've discussed other approaches that might make it easier to make specific kinds of breaking changes, but they're a lot murkier both in terms of implementing them and whether they'd support the kinds of changes we need.

@davepacheco
Copy link
Collaborator Author

(Copying some of this from https://github.com/oxidecomputer/rfd/pull/749#discussion_r1639057876)

@ahl raised a case recently that makes this a little trickier when you have services with circular dependencies. The plan here is basically to say that each HTTP client only supports one version, and servers support multiple versions. But with A and B having circular dependencies, say we want to rev both A's and B's API versions, we can have the new software versions support both the old and new versions of the APIs, but: which client will they use for the other service? If it's the new version, then there's no way to do an online upgrade because as soon as you upgrade the first instance of A, it can't talk to any of the old B's, and vice versa. If it's the old version, that works, but it means we can't use any new APIs until the next release, even though the current release has servers (and could have clients) that support both. We can then choose to support multiple client versions, but that's really tricky, too. (The server you're talking to can change if it's on the other side of a load balancer, or if it's updated (or rolled back!) without affecting the network connection (as can happen with elastic IPs and the like). So caching information about "what this server supports" across requests is not great.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant