Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commit confirmed #154

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions proto/gnmi_ext/gnmi_ext.proto
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ syntax = "proto3";
// extensions defined outside of this package.
package gnmi_ext;

import "google/protobuf/duration.proto";

option go_package = "github.com/openconfig/gnmi/proto/gnmi_ext";

// The Extension message contains a single gNMI extension.
Expand All @@ -30,6 +32,7 @@ message Extension {
// Well known extensions.
MasterArbitration master_arbitration = 2; // Master arbitration extension.
History history = 3; // History extension.
CommitConfirmed commit_confirmed = 4; // Commit Confirmed extension.
Copy link

@ncorran ncorran Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention that this is only used on the end of a SetRequest, with the START message an Extension of the original Set that needs confirmation?

How are the REJECT/COMMIT sent in? Is it a follow-up SetRequest with just the origin (& target) set (plus the CommitConfirmed msg). Is it valid to enforce in the server that there is no actual data to set in the message (i.e. its just a wrapper for conveying the next steps in handling a previous SetRequest), or is it expected that a CommitConfirm for a previous SetRequest could be updated in an independent SetRequest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is common to any extension.
Rejects and Confirms are sent within the SetRequest message.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - is also clear in the linked .md you referenced

}
}

Expand Down Expand Up @@ -89,3 +92,47 @@ message TimeRange {
int64 start = 1; // Nanoseconds since the epoch
int64 end = 2; // Nanoseconds since the epoch
}

// The CommitID is returned by the gNMI server to indicate the ID of a commit.
// It is returned in response to the Set Request with the CommitConfirm message
// that had COMMIT_CONFIRMED_ACTION_START set.
// The CommitID is used in the subsequent SetRequest/CommitConfirm messages
// to identify commit to be confirmed, reset or rejected.
message CommitID {
string id = 1;
Copy link

@ncorran ncorran Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is the level of security on the commit id here?
    • Can I accidentally or maliciously confirm or reject someone else's commit?
  2. What is the scope
  • Is the commit id expected to be valid only within some scope?
  • What checks can a server make to see that the CONFIRM/REJECT request comes from the same client as the START (other than commit id match); can we assume the following request will come from a connection/session with the same parameters?
  1. Can any number of commit confirms be open at once? (What is a server wanted to set a max limit)?
  2. Should there be a max timeout, or could a server impose one? (If it can, should the CommitID message also include a duration in the response so the client knows what time they have? Similarly would it be useful to return the server default duration for the case where 0 is input?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. a user executing gNMI RPC is authenticated using the standard auth mechanisms. It might depend on the implementation, but one can expect that a commit confirmed started by user U1 can only be confirmed by U1 user.
  2. I am not aware of NOSes that support multiple parallel commits. If there are NOSes that support multiple parallel commits this spec must be adjusted.

Copy link

@ncorran ncorran Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can that be added into the .md. I think it's important. It means the server should store the U1+commit-id somewhere ready for the future update, and closes a collection of security holes.

  2. Commits were always inline with the SetRequest previously under gNMI, so as soon as the SetRequest was processed and the SetResponse was sent the next commit could continue (might already be queued from another session). It isn't clear what the semantics are now that the commit is 'not confirmed until later'.

Are you intending this to be a pseudo-transaction blocking all other users/sessions from committing while this commit is validated? If so I think perhaps you should rename _START to _START_W_LOCK to capture that semantic. (This then narrows down the use-case for using this extension to one where all requests are funnelled through one agent, or all agents can handle being queued...)

Although it would be possible to enable other commits to continue, it then leaves a question of what to do on timeout/reject, because other commits may have overlapped with this one and an 'undo' of this commit may not leave a valid system state any more. I suspect this mode would be problematic to define. Is your use-case the 'W_LOCK' case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed commit-confirmed workflow assumes that once the commit-confirmed started, the lock is put in place preventing any other commits from happening.

Any subsequent SetRequest without commit-confirmed extension (a regular SetRequest) will yield an error. This is not explained in openconfig/reference#198 and you made a great point that it needs to be clarified.

Because no parallel commit-confirmed are allowed, it is implicitly assumed that the running datastore is locked for any other commits until the commit in question is confirmed or rejected

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great - yes please clarify in the document, but also I think it would be better to make this explict here with the action names or at least the comments next to the action names:
START has to also LOCK
REJECT/CONFIRM have to UNLOCK
TIMEOUT has to be handled as REJECT (rollback config and also UNLOCK)

Copy link
Contributor Author

@hellt hellt Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer for now and wait for OC/Google comments on this PR since it can be that they would like to proceed with their version which is in #155

}

// The CommitConfirmed allows automatic rollback of a commit
// if Set changes are not confirmed within the specified timeout.
// The document about gNMI commit confirmed can be found at
// https://github.com/openconfig/reference/blob/master/rpc/gnmi/gnmi-commit-confirmed.md
hellt marked this conversation as resolved.
Show resolved Hide resolved
message CommitConfirmed {
CommitConfirmedAction action = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rational behind defining action as an enum over what is proposed in openconfig/reference#197 which is to use oneof id which implicitly models the 3 actions - commit/confirm/cancel(START/CONFIRM/REJECT) and also implicitly enforces required behavior at the proto level, mandating the request to be only one of these actions at a time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kidiyoor this is subjective, but it feels more natural to the way users approach commit confirmation via the CLI/Netconf.

When I see commit_id, confirm_id, cancel_id it somehow feels that these may be different IDs, though my understanding is that we don't have vendor implementation which support multiple parallel commits.

Since actions are contained within the ENUM you have the same proto-guarded behavior when only a single action can be present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To eliminate confusion of the spec appearing to use different IDs - I have updated the oneof to represent actions instead of IDs - https://github.com/openconfig/gnmi/pull/155/files. With this change both the proposal looks identical in terms of semantics and functionality.

I believe that using oneof over ENUM for this particular spec helps define actions much more clearly. Eg. rollback_duration is only valid during commit. oneof enforces that behavior, with ENUMs client can pass rollback_duration field during confirm or cancel request. Its best to avoid unnecessary explanation in documentation if we can enforce them in the spec which makes the API much cleaner.

Copy link
Contributor Author

@hellt hellt Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kidiyoor @dplore I think one of the fundamental differences between #154 and #155 is that in our proposal, the CommitID message is returned by the gNMI server. In other words, the server assigns Commit IDs and not the client selecting them.

This matched our implementation on both platforms (SR Linux and SR OS), and I feel like this is also true for most other NOSes that currently support commit operations.

I just noticed that @kidiyoor commented about point 1 here #154 (comment)

From our standpoint we can also work with client-selected IDs (like it is done in netconf), so it seems we just need to converge which way to go with the commit-confirmed.

// rollback timeout duration expressed in seconds and nanoseconds.
// If timeout is not set (or seconds and ns are both 0), then
// the default gNMI server's rollback timeout duration is used.
google.protobuf.Duration timeout = 2;
// CommitID from commit for ACCEPT or REJECT
CommitID commit_id = 3;
}

// A COMMIT_CONFIRMED_ACTION_START starts a commit with a configured rollback
// timeout and will return the CommitID assigned by the server. The CommitID is
// used to identify the current commit. The COMMIT_CONFIRMED_ACTION_CONFIRM and
// COMMIT_CONFIRMED_ACTION_REJECT confirms or cancels a commit using the
// commit_id.
enum CommitConfirmedAction {
// gNMI server should return InvalidArgument error if the action is not
// specified for a CommitConfirmed extension.
COMMIT_CONFIRMED_ACTION_UNSPECIFIED = 0;
// Starts a commit with a specified rollback timeout
// or resets the timeout for the commit that has confirmation pending.
COMMIT_CONFIRMED_ACTION_START = 1;
// Confirm a commit identified by CommitID.
COMMIT_CONFIRMED_ACTION_CONFIRM = 2;
// Reject and rollback a commit identified by CommitID
// for which rollback timeout has not expired.
// If no active commit with the non expired rollback timeout is found,
// gNMI server should return InvalidArgument error.
COMMIT_CONFIRMED_ACTION_REJECT = 3;
}