commit confirmed #154

hellt · 2023-10-11T11:56:26Z

This PR adds a proposal for commit confirmed extension.

Extension description is provided in openconfig/reference#198

commit confirmed initial version moved commit confirmed messages to gnmi_ext file and removed "request suffix" from CommitConfirmed message moved commit confirmed messages to gnmi_ext file and removed "request suffix" from CommitConfirmed message added default rollback timeout clarifications added comments for CommitConfirmedAction enum values and changed ACTION_ACCEPT to ACTION_CONFIRM added comments for commitID message provided link to a future reference doc for the CommitConfirmed message clarified COMMIT_CONFIRMED_ACTION_START for an existing commit aligned file name with existing files fixed wrong doc ref

proto/gnmi_ext/gnmi_ext.proto

ncorran · 2023-10-11T13:18:30Z

proto/gnmi_ext/gnmi_ext.proto

+// The CommitID is used in the subsequent SetRequest/CommitConfirm messages
+// to identify commit to be confirmed, reset or rejected.
+message CommitID {
+  string id = 1;


What is the level of security on the commit id here?

Can I accidentally or maliciously confirm or reject someone else's commit?

What is the scope

Is the commit id expected to be valid only within some scope?

What checks can a server make to see that the CONFIRM/REJECT request comes from the same client as the START (other than commit id match); can we assume the following request will come from a connection/session with the same parameters?

Can any number of commit confirms be open at once? (What is a server wanted to set a max limit)?

Should there be a max timeout, or could a server impose one? (If it can, should the CommitID message also include a duration in the response so the client knows what time they have? Similarly would it be useful to return the server default duration for the case where 0 is input?)

a user executing gNMI RPC is authenticated using the standard auth mechanisms. It might depend on the implementation, but one can expect that a commit confirmed started by user U1 can only be confirmed by U1 user.

I am not aware of NOSes that support multiple parallel commits. If there are NOSes that support multiple parallel commits this spec must be adjusted.

Can that be added into the .md. I think it's important. It means the server should store the U1+commit-id somewhere ready for the future update, and closes a collection of security holes.

Commits were always inline with the SetRequest previously under gNMI, so as soon as the SetRequest was processed and the SetResponse was sent the next commit could continue (might already be queued from another session). It isn't clear what the semantics are now that the commit is 'not confirmed until later'.

Are you intending this to be a pseudo-transaction blocking all other users/sessions from committing while this commit is validated? If so I think perhaps you should rename _START to _START_W_LOCK to capture that semantic. (This then narrows down the use-case for using this extension to one where all requests are funnelled through one agent, or all agents can handle being queued...)

Although it would be possible to enable other commits to continue, it then leaves a question of what to do on timeout/reject, because other commits may have overlapped with this one and an 'undo' of this commit may not leave a valid system state any more. I suspect this mode would be problematic to define. Is your use-case the 'W_LOCK' case?

The proposed commit-confirmed workflow assumes that once the commit-confirmed started, the lock is put in place preventing any other commits from happening.

Any subsequent SetRequest without commit-confirmed extension (a regular SetRequest) will yield an error. This is not explained in openconfig/reference#198 and you made a great point that it needs to be clarified.

Because no parallel commit-confirmed are allowed, it is implicitly assumed that the running datastore is locked for any other commits until the commit in question is confirmed or rejected

Great - yes please clarify in the document, but also I think it would be better to make this explict here with the action names or at least the comments next to the action names:
START has to also LOCK
REJECT/CONFIRM have to UNLOCK
TIMEOUT has to be handled as REJECT (rollback config and also UNLOCK)

I will defer for now and wait for OC/Google comments on this PR since it can be that they would like to proceed with their version which is in #155

ncorran · 2023-10-11T13:25:05Z

proto/gnmi_ext/gnmi_ext.proto

@@ -30,6 +32,7 @@ message Extension {
    // Well known extensions.
    MasterArbitration master_arbitration = 2;  // Master arbitration extension.
    History history = 3;                       // History extension.
+    CommitConfirmed commit_confirmed = 4;      // Commit Confirmed extension.


Is the intention that this is only used on the end of a SetRequest, with the START message an Extension of the original Set that needs confirmation?

How are the REJECT/COMMIT sent in? Is it a follow-up SetRequest with just the origin (& target) set (plus the CommitConfirmed msg). Is it valid to enforce in the server that there is no actual data to set in the message (i.e. its just a wrapper for conveying the next steps in handling a previous SetRequest), or is it expected that a CommitConfirm for a previous SetRequest could be updated in an independent SetRequest?

This is common to any extension.
Rejects and Confirms are sent within the SetRequest message.

Thanks - is also clear in the linked .md you referenced

kidiyoor · 2023-10-11T21:23:41Z

proto/gnmi_ext/gnmi_ext.proto

+// The document about gNMI commit confirmed can be found at
+// https://github.com/openconfig/reference/blob/master/rpc/gnmi/gnmi-commit-confirmed.md
+message CommitConfirmed {
+  CommitConfirmedAction action = 1;


What is the rational behind defining action as an enum over what is proposed in openconfig/reference#197 which is to use oneof id which implicitly models the 3 actions - commit/confirm/cancel(START/CONFIRM/REJECT) and also implicitly enforces required behavior at the proto level, mandating the request to be only one of these actions at a time.

@kidiyoor this is subjective, but it feels more natural to the way users approach commit confirmation via the CLI/Netconf.

When I see commit_id, confirm_id, cancel_id it somehow feels that these may be different IDs, though my understanding is that we don't have vendor implementation which support multiple parallel commits.

Since actions are contained within the ENUM you have the same proto-guarded behavior when only a single action can be present.

To eliminate confusion of the spec appearing to use different IDs - I have updated the oneof to represent actions instead of IDs - https://github.com/openconfig/gnmi/pull/155/files. With this change both the proposal looks identical in terms of semantics and functionality.

I believe that using oneof over ENUM for this particular spec helps define actions much more clearly. Eg. rollback_duration is only valid during commit. oneof enforces that behavior, with ENUMs client can pass rollback_duration field during confirm or cancel request. Its best to avoid unnecessary explanation in documentation if we can enforce them in the spec which makes the API much cleaner.

@kidiyoor @dplore I think one of the fundamental differences between #154 and #155 is that in our proposal, the CommitID message is returned by the gNMI server. In other words, the server assigns Commit IDs and not the client selecting them.

This matched our implementation on both platforms (SR Linux and SR OS), and I feel like this is also true for most other NOSes that currently support commit operations.

I just noticed that @kidiyoor commented about point 1 here #154 (comment)

From our standpoint we can also work with client-selected IDs (like it is done in netconf), so it seems we just need to converge which way to go with the commit-confirmed.

hellt · 2023-10-19T15:09:12Z

@dplore does it make sense to ask for the community to cast votes/comment for #154 and #155 to move the needle forward?

dplore · 2023-10-19T16:54:58Z

@dplore does it make sense to ask for the community to cast votes/comment for #154 and #155 to move the needle forward?

We can take comments now and also add to the agenda for next OC Community review (first Thursday of each month).

The decision to merge will be made by the OC Operators group which has weekly private meetings. We will review and compare both at the Oct 24th occurrence of the OC Operators meeting.

proto/gnmi_ext/gnmi_ext.proto

dplore · 2023-10-24T16:28:55Z

OC Operators reviewed today. There is some slight preference for #154 based on slightly better/more explicit documentation for each field in the proto. The protos seem equivalent in functionality.

An open question for the community is, is #154 is missing anything? Please provide comments in the next two weeks, we'll move to merge this on Nov 7, 2023

Co-authored-by: Darren Loher <[email protected]>

kidiyoor · 2023-10-24T19:05:20Z

It looks like there are two key differences b/w #154 and #155, we should probably clarify for posterity why we are choosing one over the other.

Identifier(ID) generation

As per #154, The Commit ID is generated by the server. Client receives the ID in the response and uses this Commit ID to confirm/cancel the request.

As per #155, that server should not be involved in ID management. Clients should generate and store the ID locally before making the SetRequest, this ID is passed during the request and server registers it, the same ID is used by the clients to confirm/cancel the request.

Strong opinion: There are already systems which deal with configuration generation and have an identifier associated with the generated configuration. We should allow such system to use the same identifier as the commit id. This will enhance the debug ability and visibility of the configuration flow throughout the pipeline.

Use of ENUM vs oneof

I am ok with either, as long as state the reason for the choice.

Opinion: It looks to me that oneof is able to model the three action in a more readable fashion without requiring much explanation. Specially if you look at field like rollback_duration which is only applicable during request, we are able to ensure it in the spec itself. However, this is not a strong opinion, if the community thinks its ok to use ENUM here that's fine.

kidiyoor · 2023-10-31T16:45:42Z

In support of this claim -There are already systems which deal with configuration generation and have an identifier associated with the generated configuration, I want to point out that there is already precedence for this in NETCONF - https://datatracker.ietf.org/doc/html/rfc6241#section-8.4.1

 If the
   <persist> element is given in the confirmed <commit> operation, a
   follow-up commit and the confirming commit can be given on any
   session, and they MUST include a <persist-id> element with a value
   equal to the given value of the <persist> element.

The server expects client to pass the optional during commit operation.

hellt · 2023-11-15T07:40:51Z

replaced by #155

hellt mentioned this pull request Oct 11, 2023

Commit confirmed gNMI extension openconfig/reference#198

Closed

hellt added 2 commits October 11, 2023 15:56

added newline

f71b071

hellt force-pushed the nokia-commit-confirm branch from 8dfe270 to f71b071 Compare October 11, 2023 12:57

ncorran reviewed Oct 11, 2023

View reviewed changes

proto/gnmi_ext/gnmi_ext.proto Show resolved Hide resolved

ncorran reviewed Oct 11, 2023

View reviewed changes

kidiyoor reviewed Oct 11, 2023

View reviewed changes

kidiyoor mentioned this pull request Oct 11, 2023

Support confirmed commit in gNMI SetRequest openconfig/reference#197

Closed

dplore reviewed Oct 24, 2023

View reviewed changes

proto/gnmi_ext/gnmi_ext.proto Outdated Show resolved Hide resolved

mention rollback for a reject action

f619867

Co-authored-by: Darren Loher <[email protected]>

hellt closed this Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

commit confirmed #154

commit confirmed #154

hellt commented Oct 11, 2023 •

edited

Loading

ncorran Oct 11, 2023 •

edited

Loading

hellt Oct 11, 2023

ncorran Oct 11, 2023 •

edited

Loading

hellt Oct 11, 2023

ncorran Oct 13, 2023

hellt Oct 13, 2023 •

edited

Loading

ncorran Oct 11, 2023 •

edited

Loading

hellt Oct 11, 2023

ncorran Oct 11, 2023

kidiyoor Oct 11, 2023

hellt Oct 12, 2023

kidiyoor Oct 20, 2023

hellt Oct 24, 2023 •

edited

Loading

hellt commented Oct 19, 2023

dplore commented Oct 19, 2023

dplore commented Oct 24, 2023

kidiyoor commented Oct 24, 2023 •

edited

Loading

kidiyoor commented Oct 31, 2023 •

edited

Loading

hellt commented Nov 15, 2023

commit confirmed #154

commit confirmed #154

Conversation

hellt commented Oct 11, 2023 • edited Loading

ncorran Oct 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncorran Oct 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hellt Oct 13, 2023 • edited Loading

Choose a reason for hiding this comment

ncorran Oct 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hellt Oct 24, 2023 • edited Loading

Choose a reason for hiding this comment

hellt commented Oct 19, 2023

dplore commented Oct 19, 2023

dplore commented Oct 24, 2023

kidiyoor commented Oct 24, 2023 • edited Loading

kidiyoor commented Oct 31, 2023 • edited Loading

hellt commented Nov 15, 2023

hellt commented Oct 11, 2023 •

edited

Loading

ncorran Oct 11, 2023 •

edited

Loading

ncorran Oct 11, 2023 •

edited

Loading

hellt Oct 13, 2023 •

edited

Loading

ncorran Oct 11, 2023 •

edited

Loading

hellt Oct 24, 2023 •

edited

Loading

kidiyoor commented Oct 24, 2023 •

edited

Loading

kidiyoor commented Oct 31, 2023 •

edited

Loading