Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

func: add new picker dependency #20029

Merged
merged 18 commits into from
Mar 15, 2024
Merged

func: add new picker dependency #20029

merged 18 commits into from
Mar 15, 2024

Conversation

Juanadelacuesta
Copy link
Member

@Juanadelacuesta Juanadelacuesta commented Feb 22, 2024

This PR introduces the new options for reconciling a reconnecting allocation and its replacement:

  • Best score (Current implementation)
  • Keep original
  • Keep replacement
  • Keep the one that has run the longest time

It is achieved by adding a new dependency to the allocReconciler that calls the corresponding function depending on the task group's disconnect strategy. For more detailed information, refer to the RFC

It resolves 15144

nomad/structs/structs.go Outdated Show resolved Hide resolved
scheduler/reconcile_util.go Outdated Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
nomad/structs/structs_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting stuff! The reconnect logic has always been confusing to me from an user perspective 😅

May main concern here is the pickLongestRunning logic. We may need to take a different approach to make this decision.

nomad/structs/group.go Outdated Show resolved Hide resolved
Comment on lines 11077 to 11078
// ShouldBeReplaced tests an alloc for replace in case of disconnection
func (a *Allocation) ShouldBeReplaced() *bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a nil return represent in this case?

It's probably worth documenting it in the docstring since it makes the return value a tri-state.

Comment on lines 11446 to 11448
// GetLeaderTask will return the leader task in the allocation
// if there is one, otherwise it will return an empty task.
func (a *Allocation) GetLeaderTask() Task {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// GetLeaderTask will return the leader task in the allocation
// if there is one, otherwise it will return an empty task.
func (a *Allocation) GetLeaderTask() Task {
// LeaderTask will return the leader task in the allocation
// if there is one, otherwise it will return an empty task.
func (a *Allocation) LeaderTask() *Task {

Gettters in Go don't usually have a Get prefix, and we probably want to return the task pointer to avoid a copy?

I was curious about the difference because these assumptions can be tricky to reason about, so I wrote a quick benchmark. Returning the pointer seems to be an order of magnitude faster 😄

BenchmarkTaskReturn-10          29234692                41.52 ns/op
BenchmarkTaskReturn2-10         295444098                4.086 ns/op

Comment on lines 11457 to 11473
switch len(tg.Tasks) {
case 0:
return task

case 1:
task = *tg.Tasks[0]

default:
for _, t := range tg.Tasks {
if t.Leader {
task = *t
break
}
}
}

return task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
switch len(tg.Tasks) {
case 0:
return task
case 1:
task = *tg.Tasks[0]
default:
for _, t := range tg.Tasks {
if t.Leader {
task = *t
break
}
}
}
return task
switch len(tg.Tasks) {
case 0:
return task
case 1:
return *tg.Tasks[0]
}
for _, t := range tg.Tasks {
if t.Leader {
return *t
}
}
return task

Minor nit-pick, but I think this is a little easier to read?

Comment on lines 11476 to 11479
// LatestStartOfTask returns the time of the last start event for the given task
// using the allocations TaskStates. If the task has not started, the zero time
// will be returned.
func (a *Allocation) LatestStartOfTask(taskName string) time.Time {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// LatestStartOfTask returns the time of the last start event for the given task
// using the allocations TaskStates. If the task has not started, the zero time
// will be returned.
func (a *Allocation) LatestStartOfTask(taskName string) time.Time {
// LastTaskStart returns the time of the last start event for the given task
// using the allocations TaskStates. If the task has not started, the zero time
// will be returned.
func (a *Allocation) LastTaskStart(taskName string) time.Time {

Another nit-pick 😅

This name matches the name of LastUnknown, which returns a similar result. It could also be useful to move the method so it's closer to that one.

nomad/nomad/structs/structs.go

Lines 11394 to 11409 in 3193ac2

// LastUnknown returns the timestamp for the last time the allocation
// transitioned into the unknown client status.
func (a *Allocation) LastUnknown() time.Time {
var lastUnknown time.Time
for _, s := range a.AllocStates {
if s.Field == AllocStateFieldClientStatus &&
s.Value == AllocClientStatusUnknown {
if lastUnknown.IsZero() || lastUnknown.Before(s.Time) {
lastUnknown = s.Time
}
}
}
return lastUnknown.UTC()
}

Comment on lines +36 to +38
type ReconnectingPicker interface {
PickReconnectingAlloc(disconnect *structs.DisconnectStrategy, original *structs.Allocation, replacement *structs.Allocation) *structs.Allocation
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this interface? reconnectingpicker.ReconnectingPicker seems to be the only implementation.

I could see a case where the drain strategy could be an interface with one implementation per strategy, but ReconnectingPicker seems to be handling all of them already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im aiming at removing the responsibility of the picking from the allocReconciler, that is the main reason of te interface :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interfaces in Go are a bit strange, and work differently than in other languages, like Java or C#.

Since they're implicitly defined pre-creating them can often lead to a confusing implementation. Specially for people new to the code base, trying to "jump to definition" in an interface is quite frustrating 😬

Sometimes that's unavoidable because we want to have multiple implementations (like a real thing and a test implementation), or if want to restrict what to expose (like only exposing certain methods of a struct).

But in this case I can't really think of a different implementations of ReconnectingPicker. The entire decision is already encapsulated in the reconnectingpicker.ReconnectingPicker implementation (the fact they have the same name kind of point to that as well 😅).

So we don't need this interface to remove responsibility from the reconciler. Using the concrete struct already accomplishes that. And having it creates unnecessary indirection that makes the code harder to follow.

If, later on, we do find a need for an interface, the implicit nature of Go's interface can help. If the reconciler ends up calling X(), Y(), and Z(), where each behaves differently under some conditions, we can easily add them to a new interface without much code change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having that interface in place gives us a define frontier between the alloc reconciler and the reconnecting picker: Imagine we refactor the reconciler and we want to unit test the logic. If we remove the interface, we remove the possibility of having a look at the internals of the reconciliation, we will need to write tests that also take into account the picker logic, no way to isolate components. If we leave the interface where it is, it is the perfect cutting point where we can add a test implementation of the picker just for the purpose of verifying the behaviour of the reconciliation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in PickReconnectingAlloc is already pretty self-contained, it doesn't access external data and doesn't cause any side-effects, so I'm not sure we would need to isolate it in tests. I would expect the need would have showed up in the PR already.

But nevertheless, the nice things about implicit interfaces is that, in case we do find ourselves needing alternative implementation, the only line we would need to change is the type of reconnectingPicker, everything else stays the same.

reconnectingPicker ReconnectingPicker

The part of the code that has the biggest impact (and it's not even that big) on the scenario you mentioned is this one:

reconnectingPicker: reconnectingpicker.New(logger),

This is what is preventing us from feeding the reconciler with different implementation because the reconciler itself is making a decision on what to instantiate, instead of the dependency being fed externally, so it's violating the isolation you mentioned.

But, for the same reason we don't need this interface, we don't need to change this logic. If the need does arise to provide alternate implementation, it's pretty straightforward to update the code. But until then, this interfaces creates unnecessary indirection, making the code hard to read.

https://100go.co/5-interface-pollution/ has some good discussion about interfaces in Go.

Copy link
Member Author

@Juanadelacuesta Juanadelacuesta Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is meant to isolate the logic of the reconciler, not of the picker. When we get to the point of refactoring the reconciler, we can add a configuration option to change the picker, as done in other parts of the code. I you ask me, there are not enough interfaces in the nomad code, isolating components is almost impossible in some parts :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're refactoring the reconciler then adding this interface if necessary won't be any extra work. But until then, this interface is not bringing any benefits, and it's making the code harder to read.

Comment on lines 24 to 31
// Check if the replacement is newer.
// Always prefer the replacement if true.
replacementIsNewer := replacement.Job.Version > original.Job.Version ||
replacement.Job.CreateIndex > original.Job.CreateIndex
if replacementIsNewer {
rp.logger.Debug("replacement has a newer version, keeping replacement")
return replacement
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check if the replacement is newer.
// Always prefer the replacement if true.
replacementIsNewer := replacement.Job.Version > original.Job.Version ||
replacement.Job.CreateIndex > original.Job.CreateIndex
if replacementIsNewer {
rp.logger.Debug("replacement has a newer version, keeping replacement")
return replacement
}
// Check if the replacement is for a newer job version.
// Always prefer the replacement if true.
replacementIsNewer := replacement.Job.Version > original.Job.Version ||
replacement.Job.CreateIndex > original.Job.CreateIndex
if replacementIsNewer {
rp.logger.Debug("replacement has a newer job version, keeping replacement")
return replacement
}

I think the replacement itself would always be newer right? 😅

So just making this a bit more explicit.

Comment on lines 33 to 35
if result != replacement {
t.Fatalf("expected replacement, got %v", result)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if result != replacement {
t.Fatalf("expected replacement, got %v", result)
}
must.Eq(t, replacement, result)

I think this works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I blame coPilot! jejej

Comment on lines +26 to +31
replacement := &structs.Allocation{
Job: &structs.Job{
Version: 2,
CreateIndex: 2,
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to matrix test each field being modified independently. Testing for a change in both can hide bugs in one of the two checks.

Comment on lines 94 to 107
func (rp *ReconnectingPicker) pickLongestRunning(original *structs.Allocation, replacement *structs.Allocation) *structs.Allocation {
lt := original.GetLeaderTask()

// Default to the first task in the group if no leader is found.
if lt.Name == "" {
lt = *original.Job.LookupTaskGroup(original.TaskGroup).Tasks[0]
}

// If the replacement has a later start time, keep the original.
if original.LatestStartOfTask(lt.Name).Sub(replacement.LatestStartOfTask(lt.Name)) < 0 {
return original
}

return replacement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few subtleties here that worry me a bit.

  1. Task states are ephemeral, I think we only keep like 10 around? So for a long-living task I would expect the start event to have been long gone, specially if it uses features like template where each change will generate a handful of events. So in a more realistic scenario we would end up with original.LatestStartOfTask and replacement.LatestStartOfTask being both zero 😬
  2. This is only checking the leader task, or the first one, so it's not taking into consideration aspects such as task lifecycle. If the first task is a prestart sidecar, for example, I think this would give unexpected results.

I think a better approach would be use one of LastRestart or StartedAt (or probably a combination of both) from the alloc's TaskState?

nomad/nomad/structs/structs.go

Lines 8816 to 8826 in 3193ac2

// LastRestart is the time the task last restarted. It is updated each time the
// task restarts
LastRestart time.Time
// StartedAt is the time the task is started. It is updated each time the
// task starts
StartedAt time.Time
// FinishedAt is the time at which the task transitioned to dead and will
// not be started again.
FinishedAt time.Time

We would also need to compare all tasks (or maybe at lest filter by lifecycle) so that we get a more general sense of what to consider "longest running" allocation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me add a new filter to use a task from the main cycle :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for main is good, but I think the biggest issue is relying on a specific task event. We can't really do that since we only store a limited number of them.

I think we need to rethink how we define longest running. The TaskState I mention could be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reimplemented! Now it uses the StartAt, Restarts and LastRestart fields.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Once that discussion with @lgfa29 about the interface is resolved this should be good-to-go.

nomad/structs/structs.go Outdated Show resolved Hide resolved
nomad/structs/structs_test.go Outdated Show resolved Hide resolved
@@ -11074,6 +11074,16 @@ func (a *Allocation) NextRescheduleTimeByTime(t time.Time) (time.Time, bool) {
return a.nextRescheduleTime(t, reschedulePolicy)
}

func (a *Allocation) RescheduleTimeOnDisconnect(now time.Time) (time.Time, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a godoc comment on this one. And do we have a path to deprecate the previous behaviour?

nomad/structs/structs.go Outdated Show resolved Hide resolved
Comment on lines 11458 to 11476
var task *Task

// flag used to avoid traversing the tasks twice in case there is no defined
// leader.
mainTaskFound := false

for _, t := range tg.Tasks {
if t.Leader {
task = t
break
}

if !mainTaskFound && t.IsMain() {
mainTaskFound = true
task = t
}
}

return task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var task *Task
// flag used to avoid traversing the tasks twice in case there is no defined
// leader.
mainTaskFound := false
for _, t := range tg.Tasks {
if t.Leader {
task = t
break
}
if !mainTaskFound && t.IsMain() {
mainTaskFound = true
task = t
}
}
return task
for _, t := range tg.Tasks {
if t.Leader || t.IsMain() {
return t
}
}
return nil

I think this does the same thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this function is used in pickLongestRunning, but it doesn't seem guarantee that the task returned is the longest running? If the leader or the first main task restarts the its StartedAt counter would reset.

Would it make sense to instead traverse all main tasks and returns the longest period any of the tasks in the group has been running for?

Comment on lines +36 to +38
type ReconnectingPicker interface {
PickReconnectingAlloc(disconnect *structs.DisconnectStrategy, original *structs.Allocation, replacement *structs.Allocation) *structs.Allocation
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're refactoring the reconciler then adding this interface if necessary won't be any extra work. But until then, this interface is not bringing any benefits, and it's making the code harder to read.

scheduler/reconnecting_picker/reconnecting_picker.go Outdated Show resolved Hide resolved
scheduler/reconnecting_picker/reconnecting_picker.go Outdated Show resolved Hide resolved
Comment on lines 11441 to 11476
// LeaderAndMainTasksInGroup returns two slices: one with the leader tasks and
// another with the main tasks that are not leader in the given task group.
// If the task group is no longer present in the allocation definition
// or there are no leaders and the main body the task is empty both slices will
// be nil. If the task group has only one task, both slices will contain that task.
//
// This function is optimized to avoid traversing the task group tasks more than
// once in case there is no leader defined.
func (a *Allocation) LeaderAndMainTasksInGroup(tg *TaskGroup) ([]*Task, []*Task) {
if tg == nil {
return nil, nil
}

switch len(tg.Tasks) {
case 0:
return nil, nil

case 1:
return tg.Tasks, tg.Tasks
}

var leaderTasks, mainTasks []*Task

for _, t := range tg.Tasks {
if t.Leader {
leaderTasks = append(leaderTasks, t)
continue
}

if t.IsMain() {
mainTasks = append(mainTasks, t)
}
}

return leaderTasks, mainTasks
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this method is only used in the reconciler (a part of our code that is called very often), it may be better to avoid allocating the slices and have the loop in StartOfOldestTask. Then we only need to keep the oldest time stored.

Some untested sample code:

func (a *Allocation) StartOfOldestTask(tg *TaskGroup) time.Time {
	now := time.Now().UTC()
	oldestStart := now

	for _, t := range tg.Tasks {
		if t.Leader {
			return a.LastStartOfTask(t.Name)
		}

		if t.IsMain() {
			ls := a.LastStartOfTask(t.Name)
			if !ls.IsZero() && ls.Before(oldestStart) {
				oldestStart = ls
			}
		}
	}

	return oldestStart
}

Since a group can only have a one leader, this also has the added benefit of cutting the loop short as soon as a leader is found.

nomad/nomad/structs/structs.go

Lines 6984 to 6986 in 428103b

if leaderTasks > 1 {
mErr = multierror.Append(mErr, fmt.Errorf("Only one task may be marked as leader"))
}

for _, task := range tasks {
ls := a.LastStartOfTask(task.Name)
if !ls.IsZero() && ls.Before(oldestStart) {
oldestStart = a.LastStartOfTask(task.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oldestStart = a.LastStartOfTask(task.Name)
oldestStart = ls

We don't need to call the function again.

Comment on lines 11512 to 11514
if oldestStart == now {
return a.LastStartOfTask(tasks[0].Name)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return the first one in this case? If no task has started yet I would imagine StartOfOldestTask to return 0 instead of a somewhat random value? 🤔

Copy link
Member Author

@Juanadelacuesta Juanadelacuesta Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero is the oldest date possible, that can interfere with the logic else where, and doesn't make much sense.
I needed a default value and that one seemed as good as any other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero time in Go is often used as a sentinel for "not set" (https://cs.opensource.google/go/go/+/refs/tags/go1.22.1:src/time/time.go;l=364-365), that's why IsZero() exists and is a handy check to have 😄

I think returning the time for the first task is an unexpected default and, looking at the code again, we're effectively already returning zero here because the only way to enter this conditional is if oldestStart is not updated, which requires at least one task to be non-zero.

So the only path that leads to this return is if all tasks have zero time, so returning the time for the first task will always return zero, but it's less explicit about and harder to reason.

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small last comments

nomad/structs/structs_test.go Outdated Show resolved Hide resolved
scheduler/reconnecting_picker/reconnecting_picker.go Outdated Show resolved Hide resolved
@Juanadelacuesta Juanadelacuesta merged commit ff72248 into main Mar 15, 2024
19 checks passed
@Juanadelacuesta Juanadelacuesta deleted the f-hg-15144-disconnect-2 branch March 15, 2024 12:42
philrenaud pushed a commit that referenced this pull request Apr 18, 2024
This commit introduces the new options for reconciling a reconnecting allocation and its replacement:

Best score (Current implementation)
Keep original
Keep replacement
Keep the one that has run the longest time
It is achieved by adding a new dependency to the allocReconciler that calls the corresponding function depending on the task group's disconnect strategy. For more detailed information, refer to the new stanza for disconnected clientes RFC.

It resolves 15144
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants