Enable Resource Manager for k8s array plugin #71

migueltol22 · 2020-03-11T22:56:03Z

TL;DR

Use resource manager for k8s array plugin as a rate limiter to k8s.

Example Config that could be used for resource manager:

resourceConfig:
  - primaryLabel: "tokens"
  - limit: 300

Type

Bug Fix
Feature
Plugin

Are all requirements met?

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

flyteorg/flyte#201

Follow-up issue

NA
OR
https://github.com/lyft/flyte/issues/

kumare3 · 2020-03-11T23:03:01Z

@DouglasCurbelo / @migueltol22 can we please create a issue for this and add it to the description

kumare3 · 2020-03-12T00:26:56Z

go/tasks/plugins/array/k8s/monitor.go

@@ -78,6 +80,15 @@ func CheckSubTasksState(ctx context.Context, tCtx core.TaskExecutionContext, kub

 		actualPhase := phaseInfo.Phase()
 		if phaseInfo.Phase().IsSuccess() {
+
+			// Release token
+			resourceNamespace := core.ResourceNamespace(tCtx.TaskExecutionMetadata().GetOwnerID().Namespace)


release has to be done in case of success/failure or aborts right? I would do this in Finalize() as no matter what once you want to finish the task, we invoke finalize and thus this would be the right place to do token release.

Also writing a simplified interface that launches, monitors, aborts and finalizes one job might be nicer way of arranging the code - wdyt?

So my question with doing this in finalize is that currently finalize will be called after all of the subtasks get run. What if we have 1000 subtasks and only 300 resources in that case we'll never release a resource (assuming my understanding is correct). I do agree that this should be done where each subtask has its own launch, monitor, etc. Any ideas on best way to handle this? Would introducing a new phase such as processingChildren be a good approach?

@kumare3 @EngHabu any thoughts on comment above?

@migueltol22 good question, so I would always release on successful completion of a task. The finalize of the one job should probably be invoked in 2 places, immediately after success and one in the finalize of the parent. If the finalize previously succeeded it would just be a no-op. Thus you will not have to worry about not release resources.

DouglasCurbelo · 2020-03-12T00:28:25Z

go/tasks/plugins/array/k8s/monitor.go

@@ -78,6 +80,15 @@ func CheckSubTasksState(ctx context.Context, tCtx core.TaskExecutionContext, kub

 		actualPhase := phaseInfo.Phase()
 		if phaseInfo.Phase().IsSuccess() {


What happen if the pod fail? Do we release the token as well?

DouglasCurbelo · 2020-03-12T00:39:40Z

go/tasks/plugins/array/k8s/launcher.go

@@ -94,6 +96,16 @@ func LaunchSubTasks(ctx context.Context, tCtx core.TaskExecutionContext, kubeCli

 		pod = ApplyPodPolicies(ctx, config, pod)

+		// Allocate Token
+		resourceNamespace := core.ResourceNamespace(pod.Namespace)
+		allocationStatus, err := tCtx.ResourceManager().AllocateResource(ctx, resourceNamespace, pod.Name, core.ResourceConstraintsSpec{})


Nob question, here we allocate resources using pod.Namespace, by on the initialization step, we :

iCtx.ResourceRegistrar().RegisterResourceQuota(ctx, core.ResourceNamespace(primaryLabel), tokenLimit)

Does that mean that we only can request tokens with the same label we used during registration? If so, does enforcement of limits per namespace would require registering quotas for each namespace on initialization?
@kumare3 ?

https://github.com/lyft/flyteplugins/blob/master/go/tasks/plugins/hive/executor.go#L127?

I also have the same question about how resource namespace and requests are used

I don't think I understand your question fully, but let me provide some details regarding RM to see if that resolves your concern.

RM is basically a pooling system for plugins. A plugin requests to create one or more pools during setup time (using ResourceRegistrar().RegisterResourceQuota()). At the end of the setup time, the RM will be created based on the valid pool-creation requests, and the actual pools will be created. During execution time, the plugin send token-allocate and token-release requests to RM, and RM will try to put and remove tokens to and from the pools specified in the requests. So the term ResourceNamespace here just means the name of a token pool, nothing else. It has nothing to do with the users' namespace or k8s namespace (i.e. <project>-<domain>). So how you want to compose a ResourceNamespace it is totally up to you.

I just realized that the argument names used in the function signature are misleading in this case. Those names made sense when I first wrote RM, but it seems like it has become confusing for its current form. I'll try to fix the naming soon. Sorry for the confusion.

DouglasCurbelo · 2020-03-12T01:28:30Z

@kumare3 : flyteorg/flyte#201

EngHabu

This is an awesome start! but I think this will need to be a more involved change than this...
The statemachine is currently simple:

Launch -> Monitor -> Compute Output/Error -> Succeed/Fail

Instead I think you will need to combine the first two
Launch&Monitor -> Compute Output/Error -> Succeed/Fail

And in Launch&Monitor it should check the subtask's phase (and that is kind of its own statemachine)
case NotLaunch:
-> AcquireToken
case HasToken:
-> Launch
case Launched:
-> Monitor
case Finished:
-> ReleaseResource

You should also implement Finalize on the plugin and release all resources (it's idempotent, even if it was called before, it's ok to call again)

EngHabu · 2020-03-16T23:45:10Z

go/tasks/plugins/array/k8s/config.go

 // Defines custom config for K8s Array plugin
 type Config struct {
 	DefaultScheduler     string `json:"scheduler" pflag:",Decides the scheduler to use when launching array-pods."`
 	MaxErrorStringLength int    `json:"maxErrLength" pflag:",Determines the maximum length of the error string returned for the array."`
 	MaxArrayJobSize      int64  `json:"maxArrayJobSize" pflag:",Maximum size of array job."`
 	OutputAssembler      workqueue.Config
 	ErrorAssembler       workqueue.Config
+	TokenConfigs         TokenConfig


Maybe ResourcesConfig?

EngHabu · 2020-03-16T23:46:09Z

go/tasks/plugins/array/k8s/config.go

@@ -32,13 +32,19 @@ var (
 	configSection = config.MustRegisterSection(configSectionKey, defaultConfig)
 )

+type TokenConfig struct {
+	primaryLabel string


These will need to be exported e.g.:

Suggested change

primaryLabel string

PrimaryLabel string `json:"primaryLabel" pflag:",What is this?"`

katrogan · 2020-04-13T20:47:02Z

go/tasks/plugins/array/core/state.go

 		} else {
 			totalRunning += count
 		}
 	}

+	if totalWaitingForResources > 0 {
+		logger.Infof(ctx, "Array is still running and waiting for resources totalWaitingForResources[%v]", totalWaitingForResources)
+		return PhaseCheckingSubTaskExecutions


why not PhaseWaitingForResources?

Good point. Updated.

katrogan · 2020-04-14T18:19:58Z

go/tasks/plugins/array/k8s/monitor.go

 		Summary:  arraystatus.ArraySummary{},
 		Detailed: arrayCore.NewPhasesCompactArray(uint(currentState.GetExecutionArraySize())),
 	}

+	if int64(currentState.GetExecutionArraySize()) > config.MaxArrayJobSize {


nit: why not do this check before declaring all the variables above?

katrogan · 2020-04-14T18:24:14Z

go/tasks/plugins/array/k8s/monitor.go

+		// The first time we enter this state we will launch every subtask. On subsequent rounds, the pod
+		// has already been created so we return a Success value and continue with the Monitor step.
+		var status TaskStatus
+		status, err = task.Launch(ctx, tCtx, kubeClient)


sorry if i'm missing something obvious here, but if the task is not terminal we always launch? what if it's already running?

Yes that is correct. If it's already running we will get alreadyExists error and proceed with the rest of the flow in monitor.

thanks for explaining

katrogan · 2020-04-14T18:26:01Z

go/tasks/plugins/array/k8s/task.go

+	Success TaskStatus = iota
+	Error
+	Waiting
+	ReturnState


it's not immediately clear what this TaskStatus represents. can you add a comment explaining 'ReturnState'?

So there are essentially two state machines one for the entire Job and one for the subtasks. There are cases where we would like to return the entire job state based off of the outcome of a subtask. I couldn't come up with a great name. Any ideas?(will add a comment after deciding on name)

I think it's also confusing that we return Success from Launch when we successfully launch and we return the same Success from Monitor() when the task finishes successfully.
Maybe you just need LaunchResult and MonitorResult (two separate enums)

Updated to use two seperate enums. Still unsure about the name for returnState. Any ideas for a better name?

katrogan · 2020-04-14T18:27:06Z

go/tasks/plugins/array/k8s/task.go

+
+	var args []string
+	if len(podTemplate.Spec.Containers) > 0 {
+		args = append(podTemplate.Spec.Containers[0].Command, podTemplate.Spec.Containers[0].Args...)


do we always assume that there is at most one container in the podspec? should we error if there are more? 0? in line 60 and elsewhere you reference the Containers[0] again without this check

I'm not entirely sure. This was the code that was there before hand.

hm it seems odd that we check before de-referencing here but now below, can we maybe check and just throw an error?

Updated to throw error. Not sure if this is what you meant or throw error below?

@katrogan ^^

yes, thank you!

katrogan · 2020-04-14T18:29:42Z

go/tasks/plugins/array/k8s/task.go

+	return Success, nil
+}
+
+func (t Task) Abort() {}


why is this a no-op?

So abort in the handle step had never been implemented it was just calling finalize. Can update to actually call abort. Just have it as it because that was the preexisting implementation.

go/tasks/plugins/array/k8s/task.go

katrogan · 2020-04-14T18:32:55Z

go/tasks/plugins/array/k8s/task.go

+
+}
+
+func allocateResource(ctx context.Context, tCtx core.TaskExecutionContext, config *Config, podName string, childIdx int, arrayStatus *arraystatus.ArrayStatus) (bool, error) {


what's the bool in the response represent?

So in the case we do not get a resource we do not want to continue with the flow. We'd like to proceed to the next iteration in the for loop. I'm thinking a better solution would be to just return the allocation status and have the check being done by the caller of allocate Resource. Thoughts?

that sounds reasonable!

go/tasks/plugins/array/k8s/launcher.go

go/tasks/plugins/array/k8s/monitor.go

EngHabu · 2020-04-15T16:58:35Z

go/tasks/plugins/array/k8s/config.go

 // Defines custom config for K8s Array plugin
 type Config struct {
 	DefaultScheduler     string            `json:"scheduler" pflag:",Decides the scheduler to use when launching array-pods."`
 	MaxErrorStringLength int               `json:"maxErrLength" pflag:",Determines the maximum length of the error string returned for the array."`
 	MaxArrayJobSize      int64             `json:"maxArrayJobSize" pflag:",Maximum size of array job."`
+	ResourceConfig       ResourceConfig    `json:"resourceConfig" pflag:"-,ResourceConfiguration to limit number of resources used by k8s-array."`


Any reason for the - in pflag? this will make it not generate pflags options... desired or copy paste error?

Good catch. Had it originally like that because I was using a map and pflag doesn't support that type. Updated.

EngHabu · 2020-04-15T17:13:54Z

go/tasks/plugins/array/k8s/task.go

+
+func (t Task) Abort() {}
+
+func (t Task) Finalize(ctx context.Context, tCtx core.TaskExecutionContext, kubeClient core.KubeClient) error {


So this is the implementation of Abort.. Finalize should just try to deallocateResource

I see. Updated abort and finalize.

EngHabu

Apologies about the delay, just noticed you requested feedback :)

go/tasks/plugins/array/k8s/config.go

go/tasks/plugins/array/k8s/launcher.go

go/tasks/plugins/array/k8s/task.go

go/tasks/plugins/array/core/state.go

go/tasks/plugins/array/k8s/launcher.go

go/tasks/plugins/array/k8s/monitor.go

go/tasks/plugins/array/k8s/task.go

EngHabu

Awesome! I hear we've already used this in some experiments? fantastic... Please make sure Katrina is ok with your responses/resolutions to the comments she left...

I think that's as far as I'll be able to spot by just reviewing the code. I've not seen much of unit tests added though. Are you working on that?

EngHabu · 2020-04-27T21:02:05Z

go/tasks/plugins/array/k8s/task.go

+
+	var args []string
+	if len(podTemplate.Spec.Containers) > 0 {
+		args = append(podTemplate.Spec.Containers[0].Command, podTemplate.Spec.Containers[0].Args...)


@katrogan ^^

DouglasCurbelo · 2020-05-27T02:05:25Z

@katrogan @migueltol22, sorry I stop following this change. Is the change ready to merge, if not what is missing?

bnsblue · 2020-06-03T21:17:42Z

go/tasks/plugins/array/k8s/executor.go

+	if IsResourceConfigSet() {
+		primaryLabel := GetConfig().ResourceConfig.PrimaryLabel
+		limit := GetConfig().ResourceConfig.Limit
+		if err := iCtx.ResourceRegistrar().RegisterResourceQuota(ctx, core.ResourceNamespace(primaryLabel), limit); err != nil {


@EngHabu regarding the resource name: We discussed earlier that we might need to add flyte cluster information to the resource name as a preparation step for moving toward a centralized Redis. Do you think the change should be done in this PR or a separate one? cc @migueltol22 just FYI

I might have been mistaken. Maybe there is no way to get the cluster name... we might have to do something in our deployment to do that... might not be so trivial... let me ask on the channel..

Alright, found out we already pass the cluster name to spark deployment.

So we can do the same to flytepropeller to make it aware of its cluster name.

This is great! We should then be able to move to elasticache without a problem

add resource manager

d4efa26

kumare3 reviewed Mar 12, 2020

View reviewed changes

DouglasCurbelo reviewed Mar 12, 2020

View reviewed changes

EngHabu reviewed Mar 16, 2020

View reviewed changes

migueltol22 added 19 commits March 20, 2020 11:11

launchandmonitor func

74523da

check allocation status

e2c3aae

fmt

e12e287

fix build

e7c226d

errors2

8ed48d4

update state machine

6090a7b

upd state machine

3bd1ef4

remove phaseLaunchAndMonitor state

b913806

generate

61f4751

initialize array if empty

d39b457

merge conflicts

78ee66d

create resource contraint spec

e371f27

add json doc for pflag

f2a16bf

use primary label

ae4f146

upd config and task launch

286425c

switch to enum return status

9cd4e4e

impl Monitor and Finalize in task

1a5c62c

use task in finalize

a72846d

upd comment

94f695d

migueltol22 requested review from EngHabu, kumare3 and bnsblue April 6, 2020 23:17

upd Skip to Waiting

98409ba

lint and err

08b75e7

migueltol22 requested a review from katrogan April 13, 2020 19:47

katrogan reviewed Apr 14, 2020

View reviewed changes

upd based on pr feedback

ba2a74d

migueltol22 changed the title ~~[WIP] Enable Resource Manager for k8s array plugin~~ Enable Resource Manager for k8s array plugin Apr 14, 2020

migueltol22 added 3 commits April 14, 2020 17:00

upd allocate to return status, throw error on no containers

dd5db90

merge with master and remove launchSubTasks

438b0fd

lint

edd233e

EngHabu reviewed Apr 15, 2020

View reviewed changes

go/tasks/plugins/array/k8s/launcher.go Show resolved Hide resolved

EngHabu reviewed Apr 15, 2020

View reviewed changes

go/tasks/plugins/array/k8s/monitor.go Show resolved Hide resolved

EngHabu reviewed Apr 15, 2020

View reviewed changes

upd pflag, launch and monitor status

5f5d1a9

migueltol22 requested review from EngHabu and katrogan April 21, 2020 21:53

EngHabu reviewed Apr 23, 2020

View reviewed changes

go/tasks/plugins/array/k8s/config.go Show resolved Hide resolved

go/tasks/plugins/array/k8s/launcher.go Outdated Show resolved Hide resolved

go/tasks/plugins/array/k8s/task.go Outdated Show resolved Hide resolved

abort then finalize

ad4ce20

migueltol22 requested a review from EngHabu April 24, 2020 20:15

EngHabu previously approved these changes Apr 27, 2020

View reviewed changes

go/tasks/plugins/array/core/state.go Outdated Show resolved Hide resolved

go/tasks/plugins/array/k8s/launcher.go Outdated Show resolved Hide resolved

go/tasks/plugins/array/k8s/monitor.go Show resolved Hide resolved

go/tasks/plugins/array/k8s/task.go Show resolved Hide resolved

rm unnecessary code & update state transition

050e8b2

migueltol22 dismissed EngHabu’s stale review via 050e8b2 April 27, 2020 19:57

migueltol22 requested a review from EngHabu April 27, 2020 19:57

EngHabu reviewed Apr 27, 2020

View reviewed changes

bnsblue reviewed Jun 3, 2020

View reviewed changes

katrogan mentioned this pull request Aug 25, 2020

[Feature] Permit enabling plugins for certain projects + domains flyteorg/flyte#484

Closed

13 tasks

anandswaminathan mentioned this pull request Sep 15, 2020

K8 array resource manager #120

Merged

migueltol22 closed this Jul 21, 2021

		@@ -78,6 +80,15 @@ func CheckSubTasksState(ctx context.Context, tCtx core.TaskExecutionContext, kub

		actualPhase := phaseInfo.Phase()
		if phaseInfo.Phase().IsSuccess() {

	primaryLabel string
	PrimaryLabel string `json:"primaryLabel" pflag:",What is this?"`


		}

		func allocateResource(ctx context.Context, tCtx core.TaskExecutionContext, config Config, podName string, childIdx int, arrayStatus arraystatus.ArrayStatus) (bool, error) {


		func (t Task) Abort() {}

		func (t Task) Finalize(ctx context.Context, tCtx core.TaskExecutionContext, kubeClient core.KubeClient) error {

Enable Resource Manager for k8s array plugin #71

Enable Resource Manager for k8s array plugin #71

Conversation

migueltol22 commented Mar 11, 2020 • edited Loading

TL;DR

Type

Are all requirements met?

Complete description

Tracking Issue

Follow-up issue

kumare3 commented Mar 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnsblue Mar 12, 2020 • edited Loading

Choose a reason for hiding this comment

DouglasCurbelo commented Mar 12, 2020

EngHabu left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EngHabu left a comment

Choose a reason for hiding this comment

EngHabu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DouglasCurbelo commented May 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

migueltol22 commented Mar 11, 2020 •

edited

Loading

bnsblue Mar 12, 2020 •

edited

Loading

EngHabu left a comment •

edited

Loading