Promoting or inlining platform properties #166

edbaunton · 2020-08-28T12:55:19Z

We discussed this during a monthly meeting (I seem to recall that @ulfjack initially raised it), adding here for tracking.

The current design of platform properties is that they are indirectly embedded in the Command property of the Action. The Action does not send the Command directly but rather sends a Command digest.

The upshot of this is that additional blob uploads are required from the client before submitting and action as well as additional CAS interactions on the server side if any data is required from the Command. For example, platform properties.

I think the initial discussion specifically mentioned that if the server wanted to making routing decisions for an action based on the platform properties it would require and additional hit to the CAS to determine those for the action.

It seems to me that some of this extra CAS interaction overhead could be avoided if we inlined the command or platform properties into the action.

I can see that we could probably:

Extend the action message to have both command_digest or the actual command
Inline the platform properties
Something else?

The text was updated successfully, but these errors were encountered:

bergsieker · 2020-08-28T13:49:10Z

Interesting side note: Platform used to be in Action, and was moved out here <f42e4bb#diff-4153f76ba92d8d30764c0251177105e8> citing potential performance improvements. I suspect those benefits were material for output files, but incidental for Platform. The Platform/Command split was designed to conserve bandwidth, on the theory that the Command (arguments, output files, platform) are both large and relatively stable, while the Action (particularly input root) changes frequently. It's much more bandwidth-efficient (and storage-efficient) to reference stable parameters by digest, although the vast majority of the savings here is from output files. (Indeed, taken to the limit the Platform should be its own message because it's frequently constant for an entire build (and in some cases even an entire RE deployment), but even our own analysis points out that Platform is also generally small, so it doesn't really matter how we handle it.) In general, I'm hesitant about optimizing the API structure around specific scheduling implementations because I think it's opening up a can of worms--I could imagine schedulers that would want to route based on arguments or output files, for example. That said, I agree that Platform may be special because one of its chief functions is to enable routing. I'm curious what the actual impact is here--in our case, we're generally talking about Actions with an expected duration of O(seconds), such that the overhead of a one-time fetch of the command is trivial. Of the proposed options, I think inlining the Platform into the Action is the best. There are very good reasons to keep Action and Command separate, and I think that any option that leaves an optional feature (e.g., allowing Platform in either Action or Command) in place long-term is worse than settling on a single location.

…

On Fri, Aug 28, 2020 at 8:55 AM Ed Baunton ***@***.***> wrote: We discussed this during a monthly meeting (I seem to recall that @ulfjack <https://github.com/ulfjack> initially raised it), adding here for tracking. The current design of platform properties is that they are indirectly embedded in the Command property of the Action. The Action does not send the Command directly but rather sends a Command digest. The upshot of this is that additional blob uploads are required from the client before submitting and action as well as additional CAS interactions on the server side if any data is required from the Command. For example, platform properties. I think the initial discussion specifically mentioned that if the server wanted to making routing decisions for an action based on the platform properties it would require and additional hit to the CAS to determine those for the action. It seems to me that some of this extra CAS interaction overhead could be avoided if we inlined the command or platform properties into the action. I can see that we could probably: 1. Extend the action message to have both command_digest or the actual command 2. Inline the platform properties 3. Something else? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#166>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADMU232WBAE2SMXYQSDMT2TSC6SMVANCNFSM4QODHGJQ> .

ulfjack · 2020-08-28T15:04:52Z

I am concerned about the schedulers having to fetch multiple blobs from the CAS sequentially, and having to parse them into local memory (especially because we expect the Command proto to be large). Both because this is awkward for our implementation, and also because it may enable various denial-of-service vectors. Especially for small clusters, having to overprovision schedulers to safeguard against this can incur significant compute / memory costs. (I'd also be slightly concerned about the Platform proto growing significantly in size.)

I don't think this is over-designing for a specific scheduler implementation. The current protocol requires that the scheduler reads the Action & Command protos, which clearly imposes restrictions on the scheduler design.

I think this may also tie into the proposal by @EdSchouten about making the scheduler untrusted - the more CAS reads are required in the scheduler for routing, the more holes you need to poke into the security model. Personally, I think untrusted schedulers are only viable from a security perspective if either a) the platform proto being sent to the scheduler explicitly, i.e., no direct CAS reads from the scheduler, or b) the Platform proto (and every proto between the execute proto and the platform proto) are stored in a separate CAS service (public CAS / private CAS distinction). Requiring the Command to be public seems like a fairly large information leak.

I'm not sure what the best place to put the Platform proto is. I have a strong preference for moving it out of Command. Ideally, it would not be stored in the CAS at all: that would allow a scheduler design that does routing with the Platform proto only and is also untrusted, without requiring a separate public CAS (or complicating the CAS protocol to allow a public / private distinction). However, I'd be concerned about the platform proto growing to be significantly larger. Over time, I can see us define hundreds of settings in the platform proto. Maybe a compromise would be to allow the platform proto to be inlined in the execute request or referenced via digest? Too much flexibility?

bergsieker · 2020-08-28T16:09:13Z

On Fri, Aug 28, 2020 at 11:05 AM Ulf Adams ***@***.***> wrote: I am concerned about the schedulers having to fetch multiple blobs from the CAS sequentially, and having to parse them into local memory (especially because we *expect* the Command proto to be large). Both because this is awkward for our implementation, and also because it may enable various denial-of-service vectors. Especially for small clusters, having to overprovision schedulers to safeguard against this can incur significant compute / memory costs. (I'd also be slightly concerned about the Platform proto growing significantly in size.) I don't think this is over-designing for a specific scheduler implementation. The current protocol *requires* that the scheduler reads the Action & Command protos, which clearly imposes restrictions on the scheduler design.

Yeah, I think I can get behind this specifically for the platform. The original move to the Command was mostly incidental, not strongly principled.

I think this may also tie into the proposal by @EdSchouten <https://github.com/EdSchouten> about making the scheduler untrusted - the more CAS reads are required in the scheduler for routing, the more holes you need to poke into the security model. Personally, I think untrusted schedulers are only viable from a security perspective if either a) the platform proto being sent to the scheduler explicitly, i.e., no direct CAS reads from the scheduler, or b) the Platform proto (and every proto between the execute proto and the platform proto) are stored in a separate CAS service (public CAS / private CAS distinction). Requiring the Command to be public seems like a fairly large information leak. I'm not sure what the best place to put the Platform proto is. I have a strong preference for moving it out of Command. Ideally, it would not be stored in the CAS at all: that would allow a scheduler design that does routing with the Platform proto only and is also untrusted, without requiring a separate public CAS (or complicating the CAS protocol to allow a public / private distinction).

In our experience, having the Action (and by extension, the Command) explicitly stored in the CAS provides significant benefits. For example, it allows re-triggering the same action, or downloading the entire Action to re-create it locally (this is the basis for tools_remote). We use the fact that it's stored in the CAS in many other ways, too.

However, I'd be concerned about the platform proto growing to be significantly larger. Over time, I can see us define hundreds of settings in the platform proto. Maybe a compromise would be to *allow* the platform proto to be inlined in the execute request or referenced via digest? Too much flexibility?

I believe the "allow" option is too much flexibility. I'd prefer to just move the Platform (obviously with a temporary "allow" option during the transition).

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <#166 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADMU234ZRPKXQ2OC3UA4LY3SC7BSJANCNFSM4QODHGJQ> .

Platform properties are currently a member of the `command` message which is referred to in the action by digest. This requires the Execution Service to make a call to the CAS to retrieve the contents of the command if it wishes to inspect it. The platform properties are commonly used for making routing decision about the action. Therefore, in order to route an action common execution service implementations must introduce an additional call to the CAS to fully hydrate the action and determine where it should be routed. This commit promotes the platform properties from the command to the action. We deprecate the platform properties contained within the action and bump the minor version to version 2.2, following the model for `output_paths`. Fixes #166 Signed-off-by: Ed Baunton <[email protected]>

edbaunton · 2020-09-01T22:54:53Z

I think #167 satisfies the above discussion except for the case of a large set of platform properties that @ulfjack mentions. I think we would need to do something much more sophisticated with platform properties to support this case: those that are placed in-line and those that reside in the command.

Platform properties are currently a member of the `command` message which is referred to in the action by digest. This requires the Execution Service to make a call to the CAS to retrieve the contents of the command if it wishes to inspect it. The platform properties are commonly used for making routing decision about the action. Therefore, in order to route an action common execution service implementations must introduce an additional call to the CAS to fully hydrate the action and determine where it should be routed. This commit promotes the platform properties from the command to the action. We deprecate the platform properties contained within the action and bump the minor version to version 2.2, following the model for `output_paths`. Fixes #166 Signed-off-by: Ed Baunton <[email protected]>

edbaunton mentioned this issue Sep 1, 2020

RemoteEx 2.2: Promote platform properties from command to action #167

Merged

sstriker closed this as completed in #167 Oct 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promoting or inlining platform properties #166

Promoting or inlining platform properties #166

edbaunton commented Aug 28, 2020 •

edited

Loading

bergsieker commented Aug 28, 2020 via email

ulfjack commented Aug 28, 2020

bergsieker commented Aug 28, 2020 via email

edbaunton commented Sep 1, 2020

Promoting or inlining platform properties #166

Promoting or inlining platform properties #166

Comments

edbaunton commented Aug 28, 2020 • edited Loading

bergsieker commented Aug 28, 2020 via email

ulfjack commented Aug 28, 2020

bergsieker commented Aug 28, 2020 via email

edbaunton commented Sep 1, 2020

edbaunton commented Aug 28, 2020 •

edited

Loading