Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove deprecated Drain field from structs.Node #10202

Merged
merged 16 commits into from
Apr 1, 2021
Merged

Conversation

cgbaker
Copy link
Contributor

@cgbaker cgbaker commented Mar 19, 2021

Note: upgrade testing on this PR should verify that .SecretID is not emitted by the API (using curl, not the Nomad CLI), and that .Drain is emitted.

This is an attempt to remove the Drain field from structs.Node, while still keeping backwards compatibility with the api.Node.Drain field. The questions at play are:

  • do we want to actually delete this field?
  • do we like this JSON encoding extensions approach for API backwards compatibility?

Here's what the PR does:

  • Remove deprecated Drain field from structs/Node and deleted code/tests associated with this field.
  • Adds encoding extensions to put the Node.Drain field back to API responses
  • Modifies the HTTP agent and the event stream to use encoders with these extensions
  • Also removed special code for emptying structs.Node.SecretID, instead allowing the encoder to remove it using a json tag on the struct, which works because (:til:) we have configured msgpack encoding for raft to exclusively consider codec tags
  • Updates the Node drain FSM applier to use msgtype on txn so that drain events are emitted
  • Updates the contributing guide with all this, to guide folks in the future on structs-to-api options.

I added some benchmarking to see whether the JSON encoding extensions were expensive, with the following results:

BenchmarkHTTPServer_JSONEncodingWithExtensions-12       	   92414	     12141 ns/op
BenchmarkHTTPServer_JSONEncodingWithoutExtensions-12    	   94684	     11569 ns/op

So, 600ns cost with the extensions. Do with that what you will. Note, this doesn't compare against doing a struct-to-API conversion as part of encoding (because I didn't want to write it). Also, it's possible that this is slower because of neglected generated structs; I didn't investigate because of https://m.xkcd.com/1319/

nomad/state/events_test.go Outdated Show resolved Hide resolved
nomad/state/events_test.go Outdated Show resolved Hide resolved
cgbaker added 2 commits March 21, 2021 15:30
node drain: use msgtype on txn so that events are emitted
wip: encoding extension to add Node.Drain field back to API responses

new approach for hiding Node.SecretID in the API, using `json` tag
documented this approach in the contributing guide
refactored the JSON handlers with extensions
modified event stream encoding to use the go-msgpack encoders with the extensions
@@ -94,57 +94,6 @@ func TestEventFromChange_ACLTokenSecretID(t *testing.T) {
require.Empty(t, tokenEvent2.ACLToken.SecretID)
}

// TestEventFromChange_NodeSecretID ensures that a node's secret ID is not
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -966,9 +966,10 @@ func (s *StateStore) BatchUpdateNodeDrain(msgType structs.MessageType, index uin
// UpdateNodeDrain is used to update the drain of a node
func (s *StateStore) UpdateNodeDrain(msgType structs.MessageType, index uint64, nodeID string, drain *structs.DrainStrategy, markEligible bool, updatedAt int64, event *structs.NodeEvent) error {

txn := s.db.WriteTxn(index)
txn := s.db.WriteTxnMsgT(msgType, index)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was preventing Node drain completion from firing an event

} else if node.Drain {
// Deprecate in favor of scheduling eligibility and remove post-0.8
return false, "node is draining", nil
return false, "node is not eligible", nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this error message because it seemed wrong. is that allowed? @schmichael , @tgross

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Your message looks right to me.

}
// Ensure SchedulingEligibility is correctly set whenever draining so the plan applier and other scheduling logic
// only need to check SchedulingEligibility when determining whether a placement is feasible on a node.
if n.DrainStrategy != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmichael , added a comment as requested and tightened this up to make sure we honor the invariant.

@cgbaker cgbaker requested a review from schmichael March 26, 2021 19:01
@cgbaker cgbaker mentioned this pull request Mar 29, 2021
@cgbaker
Copy link
Contributor Author

cgbaker commented Apr 1, 2021

@schmichael , i rolled back the suggested changes around RPC and Node.SecretID. I also cleaned up some code and comments. This PR should be ready to go, if you'll re-review! Thanks!

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@cgbaker cgbaker merged commit f1cb714 into main Apr 1, 2021
@cgbaker cgbaker deleted the f-node-drain-api branch April 1, 2021 21:48
cgbaker added a commit that referenced this pull request Apr 2, 2021
this was unecessary after refactoring, so this moves them back to their
original location in package structs
tgross added a commit that referenced this pull request Apr 15, 2022
The CSI HTTP API has to transform the CSI volume to redact secrets,
remove the claims fields, and to consolidate the allocation stubs into
a single slice of alloc stubs. This was done manually in #8590 but
this is a large amount of code and has proven both very bug prone
(see #8659, #8666, #8699, #8735, and #12150) and requires updating
lots of code every time we add a field to volumes or plugins.

In #10202 we introduce encoding improvements for the `Node` struct
that allow a more minimal transformation. Apply this same approach to
serializing `structs.CSIVolume` to API responses.

Also, the original reasoning behind #8590 for plugins no longer holds
because the counts are now denormalized within the state store, so we
can simply remove this transformation entirely.
tgross added a commit that referenced this pull request Apr 15, 2022
The CSI HTTP API has to transform the CSI volume to redact secrets,
remove the claims fields, and to consolidate the allocation stubs into
a single slice of alloc stubs. This was done manually in #8590 but
this is a large amount of code and has proven both very bug prone
(see #8659, #8666, #8699, #8735, and #12150) and requires updating
lots of code every time we add a field to volumes or plugins.

In #10202 we introduce encoding improvements for the `Node` struct
that allow a more minimal transformation. Apply this same approach to
serializing `structs.CSIVolume` to API responses.

Also, the original reasoning behind #8590 for plugins no longer holds
because the counts are now denormalized within the state store, so we
can simply remove this transformation entirely.
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants