Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commit confirmed #154

Closed
wants to merge 3 commits into from
Closed

Conversation

hellt
Copy link
Contributor

@hellt hellt commented Oct 11, 2023

This PR adds a proposal for commit confirmed extension.

Extension description is provided in openconfig/reference#198

commit confirmed initial version

moved commit confirmed messages to gnmi_ext file

and removed "request suffix" from CommitConfirmed message

moved commit confirmed messages to gnmi_ext file

and removed "request suffix" from CommitConfirmed message

added default rollback timeout clarifications

added comments for CommitConfirmedAction enum values

and changed ACTION_ACCEPT to ACTION_CONFIRM

added comments for commitID message

provided link to a future reference doc for the CommitConfirmed message

clarified COMMIT_CONFIRMED_ACTION_START for an existing commit

aligned file name with existing files

fixed wrong doc ref
// The CommitID is used in the subsequent SetRequest/CommitConfirm messages
// to identify commit to be confirmed, reset or rejected.
message CommitID {
string id = 1;
Copy link

@ncorran ncorran Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is the level of security on the commit id here?
    • Can I accidentally or maliciously confirm or reject someone else's commit?
  2. What is the scope
  • Is the commit id expected to be valid only within some scope?
  • What checks can a server make to see that the CONFIRM/REJECT request comes from the same client as the START (other than commit id match); can we assume the following request will come from a connection/session with the same parameters?
  1. Can any number of commit confirms be open at once? (What is a server wanted to set a max limit)?
  2. Should there be a max timeout, or could a server impose one? (If it can, should the CommitID message also include a duration in the response so the client knows what time they have? Similarly would it be useful to return the server default duration for the case where 0 is input?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. a user executing gNMI RPC is authenticated using the standard auth mechanisms. It might depend on the implementation, but one can expect that a commit confirmed started by user U1 can only be confirmed by U1 user.
  2. I am not aware of NOSes that support multiple parallel commits. If there are NOSes that support multiple parallel commits this spec must be adjusted.

Copy link

@ncorran ncorran Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can that be added into the .md. I think it's important. It means the server should store the U1+commit-id somewhere ready for the future update, and closes a collection of security holes.

  2. Commits were always inline with the SetRequest previously under gNMI, so as soon as the SetRequest was processed and the SetResponse was sent the next commit could continue (might already be queued from another session). It isn't clear what the semantics are now that the commit is 'not confirmed until later'.

Are you intending this to be a pseudo-transaction blocking all other users/sessions from committing while this commit is validated? If so I think perhaps you should rename _START to _START_W_LOCK to capture that semantic. (This then narrows down the use-case for using this extension to one where all requests are funnelled through one agent, or all agents can handle being queued...)

Although it would be possible to enable other commits to continue, it then leaves a question of what to do on timeout/reject, because other commits may have overlapped with this one and an 'undo' of this commit may not leave a valid system state any more. I suspect this mode would be problematic to define. Is your use-case the 'W_LOCK' case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed commit-confirmed workflow assumes that once the commit-confirmed started, the lock is put in place preventing any other commits from happening.

Any subsequent SetRequest without commit-confirmed extension (a regular SetRequest) will yield an error. This is not explained in openconfig/reference#198 and you made a great point that it needs to be clarified.

Because no parallel commit-confirmed are allowed, it is implicitly assumed that the running datastore is locked for any other commits until the commit in question is confirmed or rejected

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great - yes please clarify in the document, but also I think it would be better to make this explict here with the action names or at least the comments next to the action names:
START has to also LOCK
REJECT/CONFIRM have to UNLOCK
TIMEOUT has to be handled as REJECT (rollback config and also UNLOCK)

Copy link
Contributor Author

@hellt hellt Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer for now and wait for OC/Google comments on this PR since it can be that they would like to proceed with their version which is in #155

@@ -30,6 +32,7 @@ message Extension {
// Well known extensions.
MasterArbitration master_arbitration = 2; // Master arbitration extension.
History history = 3; // History extension.
CommitConfirmed commit_confirmed = 4; // Commit Confirmed extension.
Copy link

@ncorran ncorran Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention that this is only used on the end of a SetRequest, with the START message an Extension of the original Set that needs confirmation?

How are the REJECT/COMMIT sent in? Is it a follow-up SetRequest with just the origin (& target) set (plus the CommitConfirmed msg). Is it valid to enforce in the server that there is no actual data to set in the message (i.e. its just a wrapper for conveying the next steps in handling a previous SetRequest), or is it expected that a CommitConfirm for a previous SetRequest could be updated in an independent SetRequest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is common to any extension.
Rejects and Confirms are sent within the SetRequest message.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - is also clear in the linked .md you referenced

// The document about gNMI commit confirmed can be found at
// https://github.com/openconfig/reference/blob/master/rpc/gnmi/gnmi-commit-confirmed.md
message CommitConfirmed {
CommitConfirmedAction action = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rational behind defining action as an enum over what is proposed in openconfig/reference#197 which is to use oneof id which implicitly models the 3 actions - commit/confirm/cancel(START/CONFIRM/REJECT) and also implicitly enforces required behavior at the proto level, mandating the request to be only one of these actions at a time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kidiyoor this is subjective, but it feels more natural to the way users approach commit confirmation via the CLI/Netconf.

When I see commit_id, confirm_id, cancel_id it somehow feels that these may be different IDs, though my understanding is that we don't have vendor implementation which support multiple parallel commits.

Since actions are contained within the ENUM you have the same proto-guarded behavior when only a single action can be present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To eliminate confusion of the spec appearing to use different IDs - I have updated the oneof to represent actions instead of IDs - https://github.com/openconfig/gnmi/pull/155/files. With this change both the proposal looks identical in terms of semantics and functionality.

I believe that using oneof over ENUM for this particular spec helps define actions much more clearly. Eg. rollback_duration is only valid during commit. oneof enforces that behavior, with ENUMs client can pass rollback_duration field during confirm or cancel request. Its best to avoid unnecessary explanation in documentation if we can enforce them in the spec which makes the API much cleaner.

Copy link
Contributor Author

@hellt hellt Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kidiyoor @dplore I think one of the fundamental differences between #154 and #155 is that in our proposal, the CommitID message is returned by the gNMI server. In other words, the server assigns Commit IDs and not the client selecting them.

This matched our implementation on both platforms (SR Linux and SR OS), and I feel like this is also true for most other NOSes that currently support commit operations.

I just noticed that @kidiyoor commented about point 1 here #154 (comment)

From our standpoint we can also work with client-selected IDs (like it is done in netconf), so it seems we just need to converge which way to go with the commit-confirmed.

@hellt
Copy link
Contributor Author

hellt commented Oct 19, 2023

@dplore does it make sense to ask for the community to cast votes/comment for #154 and #155 to move the needle forward?

@dplore
Copy link
Member

dplore commented Oct 19, 2023

@dplore does it make sense to ask for the community to cast votes/comment for #154 and #155 to move the needle forward?

We can take comments now and also add to the agenda for next OC Community review (first Thursday of each month).

The decision to merge will be made by the OC Operators group which has weekly private meetings. We will review and compare both at the Oct 24th occurrence of the OC Operators meeting.

@dplore
Copy link
Member

dplore commented Oct 24, 2023

OC Operators reviewed today. There is some slight preference for #154 based on slightly better/more explicit documentation for each field in the proto. The protos seem equivalent in functionality.

An open question for the community is, is #154 is missing anything? Please provide comments in the next two weeks, we'll move to merge this on Nov 7, 2023

@kidiyoor
Copy link
Contributor

kidiyoor commented Oct 24, 2023

It looks like there are two key differences b/w #154 and #155, we should probably clarify for posterity why we are choosing one over the other.

  1. Identifier(ID) generation

As per #154, The Commit ID is generated by the server. Client receives the ID in the response and uses this Commit ID to confirm/cancel the request.

As per #155, that server should not be involved in ID management. Clients should generate and store the ID locally before making the SetRequest, this ID is passed during the request and server registers it, the same ID is used by the clients to confirm/cancel the request.

Strong opinion: There are already systems which deal with configuration generation and have an identifier associated with the generated configuration. We should allow such system to use the same identifier as the commit id. This will enhance the debug ability and visibility of the configuration flow throughout the pipeline.

  1. Use of ENUM vs oneof

I am ok with either, as long as state the reason for the choice.

Opinion: It looks to me that oneof is able to model the three action in a more readable fashion without requiring much explanation. Specially if you look at field like rollback_duration which is only applicable during request, we are able to ensure it in the spec itself. However, this is not a strong opinion, if the community thinks its ok to use ENUM here that's fine.

@kidiyoor
Copy link
Contributor

kidiyoor commented Oct 31, 2023

In support of this claim -There are already systems which deal with configuration generation and have an identifier associated with the generated configuration, I want to point out that there is already precedence for this in NETCONF - https://datatracker.ietf.org/doc/html/rfc6241#section-8.4.1

 If the
   <persist> element is given in the confirmed <commit> operation, a
   follow-up commit and the confirming commit can be given on any
   session, and they MUST include a <persist-id> element with a value
   equal to the given value of the <persist> element.

The server expects client to pass the optional during commit operation.

@hellt
Copy link
Contributor Author

hellt commented Nov 15, 2023

replaced by #155

@hellt hellt closed this Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants