-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[public-api] Initial scaffolding and draft #8683
Conversation
Looking pretty good! Below are couple of ideas (and reasoning for them) Domain modelI would recommend using the kubernetes resource style for structure the API - having a first class message Workspace {
// House-keeping information, like first class ownership, project it belongs to etc..
string id = 1;
string owner_id = 1;
string project_id = 2;
Timestamp created = 1;
// spec defines the requested specification of the Workspace - it is only modifiable by the caller (user/system) and epxresses the desired state of the resource
WorkspaceSpec spec = 2;
// status is only modifiable/settable by the system to communicate the currently observed (not necessarily the most up to date - but eventually consistent) status of the system
WorkspaceStatus status = 3;
}
message WorkspaceSpec {
// context_url reports the original context URL the workspace was created from
string context_url = 4;
// description is a human readable description of the workspace
string description = 5;
// Only start if set to true
bool start = 6;
// when set, expresses preference for regional affinity etc
string region = 6;
// other
...
}
message WorkspaceStatus {
// The workspace instance should be what the system "assigns" to the workspace, user shouldn't set it so goes in the status
WorkspaceInstance instance = 1;
// Encodes the currently observed state of the Workspace - NEW, QUEUED, IMAGE_BUILDING etc...
WorkspaceState = 2;
...
} Following this model does a couple of things:
This approach also addresses the errors question. The error is always encoded in the status with details for both humans and computers. RPCsI would encourage to trim the RPC surface a bit more. Concretely: service WorkspacesService {
// Basic CRUD operations on the resource spec - this is quite easy to grasp for most people familiar with REST
// ListWorkspaces lists workspaces. You can filter for currently running ones by expressing constraints/filters in the ListWorkspacesRequest
rpc ListWorkspaces(ListWorkspacesRequest) returns ()
rpc CreateWorkspace() returns ()
// Some updates may not be possible - like moving a region within a given Workspace ID (at least for now), in that case this would be rejected with an error
rpc UpdateWorkspace() returns ()
// Delete should be a soft delete - mark it as "to-Delete" and the system begins termination, using status to update the consumer
rpc DeleteWorkspace() returns ()
// Subscribe
rpc SubscribeToStatusUpdates(SubscribeReq) returns (WorkspaceStatus)
rpc GetLogs(..) returns (...)
rpc StreamLogs(...) returns (...)
} Keeping the RPC surface trim eases integration - I spend less time figuring out which method to use (for example Create vs Start) and makes it easier to transactionise it. Happy to go into more details or discuss this further. I still have limited context of the whole system so definitely looking for context and details. |
Thank you for the feedback - much appreciated :) I had a read and tried to map this back.
Re CRUD/RPC:
|
I think it looks good for first attempt. We will need APIs around user IDE settings management, but can be added later. |
After a discussion/conversation with @easyCZ I've added structured errors using |
Regarding the structured errors, my thinking has been to use a custom status represetnation rather than the google RPC. Let me illustrate in an example.
However, for requests which are valid, for example for something like message ResourceSpec {
...
bool delete = 10;
}
message ResourceStatus {
// state is a machine readable high level state of the resource - last observed state by the controller
string state = 1; // NEW, QUEUED, RUNNING, DELETING, COMPLETE, ERROR
string details = 2; // human redable details of each status - when state == ERROR, this can provide hints what to do to fix it
...
}
message Response {
ResourceSpec spec = 1;
ResourceStatus status = 2;
} All valid requests return this as the underlying Resource - that is Get, Create, Delete, Update and List returns a repeated of this. So instead of using the default |
components/public-api/README.md
Outdated
idempotency_token: "<random_string>", | ||
context_url: "https://github.com/gitpod-io/gitpod", | ||
prebuild: { | ||
if_available: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be the default behavior.
components/public-api/README.md
Outdated
### Prebuild aware with log support | ||
```go | ||
contextURL := "https://github.com/gitpod-io/gitpod" | ||
prb := prebuilds.GetRunningPrebuild({ context_url: contextURL }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe starting prebuilds and starting workspaces have so much in common, that we should rather generalize over them and make 'prebuild' a property on the workspaces API rather than duplicating everything in a prebuilds API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for listening to logs and status updates, we could use a prebuild's workspace instance. Also there might be an image build ongoing.
bool if_available = 3; | ||
// uses a particular prebuild - the prebuild ID is scoped by the context URL. | ||
// Use the PrebuildService to get ahold of the prebuild_id. | ||
string prebuild_id = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would I expicitly pass a prebuild_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today's createWorkspace
call is rather complicated because it mixes prebuild selection and workspace creation to provide a form of "transactional behaviour". By making prebuilds an explicit conern this becomes a lot easier, and we can factor prebuild selection out of workspace creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I could create a workspace with a context URL and prebuild id that don't belong to each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original design the prebuild_id would be matched against the context URL (basically scoped by the context URL). That feels very complicated.
In the most recent draft the "source" of the workspace is a oneof context_url
or prebuild_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Workspace APIs should NOT accept a prebuild ID. Here's why:
The API shouldn't necessarily encode how things work today, but much more the experience we want a user to have. We can use the "time to first 200" as a metric for the API UX.
With this in mind, we should eliminate ambiguity and concepts from the API calls which are not needed, or which the system can compute/discover for itself. In this case, I believe the system should be able to lookup if a prebuild for a context exists, and use it when it does - without explicitly defining the prebuild ID. This also means that the user doesn't need to learn the details of prebuilds, to receive their benefits - great ux is invisible.
As a user, it's easier for me to think of workspaces in terms of contexts, and not prebuilds. This means that it will be easier for me to only supply the context and not the prebuild id when I'm integrating. This in turn reflects badly on us because the user is not getting the benefits of prebuilds.
The added benefit is that if we discover a better way to optimize the experience that would replace prebuilds, then the API does not need to change - we're optimizing the workspace startup internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with the sentiment: focus on API UX. In this particular case, I think there are good reasons for making CreateAndStartWorkspace
accept a prebuild_id:
- we want to make prebuilds a first-class citizen within the API, hence making the use of prebuilds explicit in the API is desirable. The API itself forms a language which describes the concepts it operates on. In this case it makes it obvious that workspaces can be created from context URLs or prebuilds. And yes, prebuilds are created from context URLs, too :)
- hiding the prebuilds away for this call makes the operation rather complicated: the API becomes "modal" like it is today where the result of an operation is not always a created/started workspace (or error), but there are different operating modes which in turn requires different responses.
- we want to make sure users actually get the prebuild they just watched the logs for. If we just made the prebuild "implicit", we'd need to introduce "modal responses" (workspace created vs prebuild running) where the latter needs to provide info on how to listen to the logs (short of providing a prebuild ID :D ); and we'd need to maintain a "transaction" where we tie the prebuild log listener to the workspace start. Making those two concerns (log listening and prebuild use) explicit is easier.
094dde3
to
d3b73d4
Compare
d3b73d4
to
f8edb19
Compare
Codecov Report
@@ Coverage Diff @@
## main #8683 +/- ##
==========================================
- Coverage 12.31% 11.17% -1.14%
==========================================
Files 20 18 -2
Lines 1161 993 -168
==========================================
- Hits 143 111 -32
+ Misses 1014 880 -134
+ Partials 4 2 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
wrt to "encoding errors in the type system": Did we explore encoding error messages as return types? Like, treating protobuf as if it would support algebraic data types? as an example
On client side we would always have to explicitly check all errors before accessing the "actual" response. And that would have to also be checked to be present for forward compatibility: in case we added an error later that means old definitions have all fields empty. IMO that nicely encodes the "unknown error" case and establishes good client-side code patterns. 🙃
What I like as well is that we clearly separate "domain errors" ("the id you requests does not return a result") from the generic protobuf errors which are more geared towards service availability / a lower abstraction layer. |
} | ||
|
||
message CreateAndStartWorkspaceRequest { | ||
string idempotency_token = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ THE single most important change that bring immediate benefit. Maybe I'm exaggerating... but maybe not. 😛
|
||
oneof source { | ||
string context_url = 2; | ||
string prebuild_id = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this feels sligthly odd... 🤔 cannot pinpoint, though. Have to sleep over it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why so?
As we now expose the prebuild ids on the dashboard, it would be nice to actually make use of them instead of going though all of the indirections.
message WorkspaceContext { | ||
// Explicit Git context | ||
message Git { | ||
string normalized_context_url = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the normalized context url?
let's use clone_url
instead.
I just found out, that some providers might use different url patterns in comparison to the context url, so you cannot just compute the clone url from it, instead it's some result from external api services which we need to store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on todays discussion between @csweichel, @geropl and myself, I'm happy to land this as is. The context of the discussion is below.
We'll land the PR as is and will iterate. The APIs and structure is likely going to change somewhat by the time this makes it an actual public API.
On the topic of Errors, once this lands we'll put up 2 draft PRs and compare the options, we'll choose based on that.
Description
This PR produces an initial draft of a public API for Gitpod. We will merge this PR once we're happy with a first draft design of the workspaces side of the public API.
Related Issue(s)
Fixes #8681
Part of #7900
Open Questions
We have a bunch of open questions we should explore/answer as part of this PR.
Error Handling
By default gRPC offers error codes. While the categories they introduce work well in practice, categorical errors alone (even with a message) often isn't enough.
Errors as part of the protocol definition
Failure modes and "unhappy paths" are very APIs get interesting. Callers need to know what to expect, and which error cases to handle. If we found a way how we could communicate those failure modes as part of the API spec, we'd enable future automatic tests and better documentation.
Right now the draft uses comments to describe possible errors. We could consider using custom "option"s for this use-case. Either way, we'd have to build tooling to lint and test this stuff.
Co-locating Workspace and Workspace Instance
In some places it's convenient to retrieve the workspace and last active instance at the same time, e.g.
ListWorkspaces
. Without this combination we're likely to see 1+n access patterns, where usersListWorkspaces
and thenGetLastActiveInstance
for each workspace. Retrieving both at the same time can be reasonably expensive though and is not always necessary.FieldMasks
allow clients to specify what they expect in the response, akin to GraphQL. They are however a bit cumbersome to work with, and support for them in all languages is unclear.Pagination
All
List*
calls currently support pagination, following a simple pagination model: in the response we return anext_page_token
which can be passed back into theList*
call to advance to the next page of the paginated result set.It's unclear if we need such capabilities today - we might just as well get away with a result limit and not be able to read "further pages".
Linting
We'll have many authors working on this API in the future. Keeping a consistent style, and making sure we're not breaking things down the road is important. Buf offers such capabilities OOTB. I've added buf liniting to the code generation script. We could just as well consider making that part of our CI build.
We will want to add means to detect breaking changes (using buf) in our CI pipeline sooner than later.
Rich Errors
Instead of relying on the official error model, we could sport a richer error model. Google APIs which rely heavily on gRPC sport such rich error model. We could re-use this rich error model, which to some extend is supported by client libraries.
We will want to decide on the model early on, because introducing it later on (while maintaining backwards compatibility) would be painful.
👍 Pro:
👎 Con:
(copied from the gRPC docs)
How to test
Not much to test - more to discuss
Release Notes