-
Notifications
You must be signed in to change notification settings - Fork 59
Scale out with propeller manager and workflow sharding #351
Conversation
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
… qualified pod name Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
cmd/controller/cmd/root.go
Outdated
} | ||
|
||
if len(cfg.ExcludeShardKey) > 0 { | ||
labelSelectorRequirement := v1.LabelSelectorRequirement{"shardKey", v1.LabelSelectorOpNotIn, cfg.IncludeShardKey} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labelSelectorRequirement := v1.LabelSelectorRequirement{"shardKey", v1.LabelSelectorOpNotIn, cfg.IncludeShardKey} | |
labelSelectorRequirement := v1.LabelSelectorRequirement{"shardKey", v1.LabelSelectorOpNotIn, cfg.ExcludeShardKey} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will we use exclude? isnt this harder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exclude is being used for the "enableUncoveredReplica" option. mainly for the project and domain shard strategies, it's only included on the shard key option for completeness - but it's easy to remove.
Signed-off-by: Daniel Rammer <[email protected]>
…from shardstrategy package tests Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
… id in EnvironmentShardStrategy Signed-off-by: Daniel Rammer <[email protected]>
@hamersaw I was talking with @EngHabu and he mentioned one important thing. You should create all the pods, with an owner-reference. The owner reference should be the same as - the flytepropeller deployment owner reference that is populated on the manager. This will enable that when you delete the deployment it will automatically delete all the pods that were created |
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
This is completed, to track this we added the owner references (if they exist) from the pod which the manager is started in to the manager pods. This approach differs from the previous shutdown hook which deleted all managed pods when the manager exited. Meaning a failure in the manager would delete them as well as a successful shutdown. In our new approach, failures in the manager pod do not affect managed pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nits
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore...
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
* added 'manager' command Signed-off-by: Daniel Rammer <[email protected]> * using go routine and timer for manager loop Signed-off-by: Daniel Rammer <[email protected]> * moved manager loop out of cmd and into pkg directory Signed-off-by: Daniel Rammer <[email protected]> * detecting missing replicas Signed-off-by: Daniel Rammer <[email protected]> * moved extracting replica from pod name to new function Signed-off-by: Daniel Rammer <[email protected]> * creating managed flytepropeller pods Signed-off-by: Daniel Rammer <[email protected]> * refactored configuration Signed-off-by: Daniel Rammer <[email protected]> * removed regex parsing for replica - checking for existance with fully qualified pod name Signed-off-by: Daniel Rammer <[email protected]> * mocked out shard strategy abstraction Signed-off-by: Daniel Rammer <[email protected]> * adding arguments to podspec for ConsistentHashingShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * updated import naming Signed-off-by: Daniel Rammer <[email protected]> * moved manager to a top-level package Signed-off-by: Daniel Rammer <[email protected]> * added shard strategy to manager configuration Signed-off-by: Daniel Rammer <[email protected]> * setting shard key label selector on managed propeller instances Signed-off-by: Daniel Rammer <[email protected]> * fixed random lint issues Signed-off-by: Daniel Rammer <[email protected]> * split pod name generate to separate function to ease future auto-scaler implementation Signed-off-by: Daniel Rammer <[email protected]> * cleaned up pod label selector Signed-off-by: Daniel Rammer <[email protected]> * delete pods on shutdown Signed-off-by: Daniel Rammer <[email protected]> * added prometheus metric reporting Signed-off-by: Daniel Rammer <[email protected]> * updated manager run loop to use k8s wait.UntilWithContext Signed-off-by: Daniel Rammer <[email protected]> * moved getKubeConfig into a shared package Signed-off-by: Daniel Rammer <[email protected]> * assigning shard and namespace labels on FlyteWorkflow Signed-off-by: Daniel Rammer <[email protected]> * implement NamespaceShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * implemented NamespaceShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * fixed shard label Signed-off-by: Daniel Rammer <[email protected]> * added comments Signed-off-by: Daniel Rammer <[email protected]> * checking for existing pods on startup Signed-off-by: Daniel Rammer <[email protected]> * handling delete of non-existent pod Signed-off-by: Daniel Rammer <[email protected]> * changes ConsistentHashing name to Random - because that's what it really is Signed-off-by: Daniel Rammer <[email protected]> * implemented EnableUncoveredReplica configuration option Signed-off-by: Daniel Rammer <[email protected]> * added leader election to manager using existing propeller config Signed-off-by: Daniel Rammer <[email protected]> * fixed disable leader election in managed propeller pods Signed-off-by: Daniel Rammer <[email protected]> * removed listPods function Signed-off-by: Daniel Rammer <[email protected]> * added leader election to mitigate concurrent modification issues Signed-off-by: Daniel Rammer <[email protected]> * enabled pprof to profile resource metrics Signed-off-by: Daniel Rammer <[email protected]> * added 'manager' target to Makefile to start manager in development mode (similar to existing server) Signed-off-by: Daniel Rammer <[email protected]> * added shard strategy test for computing key ranges Signed-off-by: Daniel Rammer <[email protected]> * fixed key range computation Signed-off-by: Daniel Rammer <[email protected]> * implemented project and domain shard types Signed-off-by: Daniel Rammer <[email protected]> * returning error on out of range podIndex during UpdatePodSpec call on shard strategy Signed-off-by: Daniel Rammer <[email protected]> * fixed random lint issues Signed-off-by: Daniel Rammer <[email protected]> * added manager tests Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * added doc comments on exported types and functions Signed-off-by: Daniel Rammer <[email protected]> * exporting ComputeKeyRange function and changed adding addLabelSelector function name to addLabelSelectorIfExists to better reflect functionality Signed-off-by: Daniel Rammer <[email protected]> * adding pod template resource version and shard config hash annotations to fuel automatic pod management on updates Signed-off-by: Daniel Rammer <[email protected]> * removed pod deletion on manager shutdown Signed-off-by: Daniel Rammer <[email protected]> * cleaned up unit tests and lint Signed-off-by: Daniel Rammer <[email protected]> * updated getContainer function to retrive flytepropeller container from pod spec using container name instead of command Signed-off-by: Daniel Rammer <[email protected]> * removed addLabelSelectorIfExists function call Signed-off-by: Daniel Rammer <[email protected]> * changed bytes.Buffer from a var to declaring with new Signed-off-by: Daniel Rammer <[email protected]> * created a new shardstrategy package Signed-off-by: Daniel Rammer <[email protected]> * generating mocks for ShardStrategy to decouple manager package tests from shardstrategy package tests Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * changed shard configuration defintions and added support for wildcard id in EnvironmentShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * updated documentation Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * setting managed pod owner references Signed-off-by: Daniel Rammer <[email protected]> * updated documentation Signed-off-by: Daniel Rammer <[email protected]> * fixed a few nits Signed-off-by: Daniel Rammer <[email protected]> * delete pods with failed state Signed-off-by: Daniel Rammer <[email protected]> * changed ShardType type to int instead of string Signed-off-by: Daniel Rammer <[email protected]> * removed default values in manager config Signed-off-by: Daniel Rammer <[email protected]> * updated config_flags with pflags generation Signed-off-by: Daniel Rammer <[email protected]> Signed-off-by: Haytham Abuelfutuh <[email protected]>
* Scale out with propeller manager and workflow sharding (#351) * added 'manager' command Signed-off-by: Daniel Rammer <[email protected]> * using go routine and timer for manager loop Signed-off-by: Daniel Rammer <[email protected]> * moved manager loop out of cmd and into pkg directory Signed-off-by: Daniel Rammer <[email protected]> * detecting missing replicas Signed-off-by: Daniel Rammer <[email protected]> * moved extracting replica from pod name to new function Signed-off-by: Daniel Rammer <[email protected]> * creating managed flytepropeller pods Signed-off-by: Daniel Rammer <[email protected]> * refactored configuration Signed-off-by: Daniel Rammer <[email protected]> * removed regex parsing for replica - checking for existance with fully qualified pod name Signed-off-by: Daniel Rammer <[email protected]> * mocked out shard strategy abstraction Signed-off-by: Daniel Rammer <[email protected]> * adding arguments to podspec for ConsistentHashingShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * updated import naming Signed-off-by: Daniel Rammer <[email protected]> * moved manager to a top-level package Signed-off-by: Daniel Rammer <[email protected]> * added shard strategy to manager configuration Signed-off-by: Daniel Rammer <[email protected]> * setting shard key label selector on managed propeller instances Signed-off-by: Daniel Rammer <[email protected]> * fixed random lint issues Signed-off-by: Daniel Rammer <[email protected]> * split pod name generate to separate function to ease future auto-scaler implementation Signed-off-by: Daniel Rammer <[email protected]> * cleaned up pod label selector Signed-off-by: Daniel Rammer <[email protected]> * delete pods on shutdown Signed-off-by: Daniel Rammer <[email protected]> * added prometheus metric reporting Signed-off-by: Daniel Rammer <[email protected]> * updated manager run loop to use k8s wait.UntilWithContext Signed-off-by: Daniel Rammer <[email protected]> * moved getKubeConfig into a shared package Signed-off-by: Daniel Rammer <[email protected]> * assigning shard and namespace labels on FlyteWorkflow Signed-off-by: Daniel Rammer <[email protected]> * implement NamespaceShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * implemented NamespaceShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * fixed shard label Signed-off-by: Daniel Rammer <[email protected]> * added comments Signed-off-by: Daniel Rammer <[email protected]> * checking for existing pods on startup Signed-off-by: Daniel Rammer <[email protected]> * handling delete of non-existent pod Signed-off-by: Daniel Rammer <[email protected]> * changes ConsistentHashing name to Random - because that's what it really is Signed-off-by: Daniel Rammer <[email protected]> * implemented EnableUncoveredReplica configuration option Signed-off-by: Daniel Rammer <[email protected]> * added leader election to manager using existing propeller config Signed-off-by: Daniel Rammer <[email protected]> * fixed disable leader election in managed propeller pods Signed-off-by: Daniel Rammer <[email protected]> * removed listPods function Signed-off-by: Daniel Rammer <[email protected]> * added leader election to mitigate concurrent modification issues Signed-off-by: Daniel Rammer <[email protected]> * enabled pprof to profile resource metrics Signed-off-by: Daniel Rammer <[email protected]> * added 'manager' target to Makefile to start manager in development mode (similar to existing server) Signed-off-by: Daniel Rammer <[email protected]> * added shard strategy test for computing key ranges Signed-off-by: Daniel Rammer <[email protected]> * fixed key range computation Signed-off-by: Daniel Rammer <[email protected]> * implemented project and domain shard types Signed-off-by: Daniel Rammer <[email protected]> * returning error on out of range podIndex during UpdatePodSpec call on shard strategy Signed-off-by: Daniel Rammer <[email protected]> * fixed random lint issues Signed-off-by: Daniel Rammer <[email protected]> * added manager tests Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * added doc comments on exported types and functions Signed-off-by: Daniel Rammer <[email protected]> * exporting ComputeKeyRange function and changed adding addLabelSelector function name to addLabelSelectorIfExists to better reflect functionality Signed-off-by: Daniel Rammer <[email protected]> * adding pod template resource version and shard config hash annotations to fuel automatic pod management on updates Signed-off-by: Daniel Rammer <[email protected]> * removed pod deletion on manager shutdown Signed-off-by: Daniel Rammer <[email protected]> * cleaned up unit tests and lint Signed-off-by: Daniel Rammer <[email protected]> * updated getContainer function to retrive flytepropeller container from pod spec using container name instead of command Signed-off-by: Daniel Rammer <[email protected]> * removed addLabelSelectorIfExists function call Signed-off-by: Daniel Rammer <[email protected]> * changed bytes.Buffer from a var to declaring with new Signed-off-by: Daniel Rammer <[email protected]> * created a new shardstrategy package Signed-off-by: Daniel Rammer <[email protected]> * generating mocks for ShardStrategy to decouple manager package tests from shardstrategy package tests Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * changed shard configuration defintions and added support for wildcard id in EnvironmentShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * updated documentation Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * setting managed pod owner references Signed-off-by: Daniel Rammer <[email protected]> * updated documentation Signed-off-by: Daniel Rammer <[email protected]> * fixed a few nits Signed-off-by: Daniel Rammer <[email protected]> * delete pods with failed state Signed-off-by: Daniel Rammer <[email protected]> * changed ShardType type to int instead of string Signed-off-by: Daniel Rammer <[email protected]> * removed default values in manager config Signed-off-by: Daniel Rammer <[email protected]> * updated config_flags with pflags generation Signed-off-by: Daniel Rammer <[email protected]> Signed-off-by: Haytham Abuelfutuh <[email protected]> * Create codeql-analysis.yml Signed-off-by: Haytham Abuelfutuh <[email protected]> * Handle code quality issue Signed-off-by: Haytham Abuelfutuh <[email protected]> * check boundaries Signed-off-by: Haytham Abuelfutuh <[email protected]> * 0 is ok Signed-off-by: Haytham Abuelfutuh <[email protected]> * Use ParseUint instead Signed-off-by: Haytham Abuelfutuh <[email protected]> * bump for DCO Signed-off-by: Haytham Abuelfutuh <[email protected]> Co-authored-by: Dan Rammer <[email protected]>
* added 'manager' command Signed-off-by: Daniel Rammer <[email protected]> * using go routine and timer for manager loop Signed-off-by: Daniel Rammer <[email protected]> * moved manager loop out of cmd and into pkg directory Signed-off-by: Daniel Rammer <[email protected]> * detecting missing replicas Signed-off-by: Daniel Rammer <[email protected]> * moved extracting replica from pod name to new function Signed-off-by: Daniel Rammer <[email protected]> * creating managed flytepropeller pods Signed-off-by: Daniel Rammer <[email protected]> * refactored configuration Signed-off-by: Daniel Rammer <[email protected]> * removed regex parsing for replica - checking for existance with fully qualified pod name Signed-off-by: Daniel Rammer <[email protected]> * mocked out shard strategy abstraction Signed-off-by: Daniel Rammer <[email protected]> * adding arguments to podspec for ConsistentHashingShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * updated import naming Signed-off-by: Daniel Rammer <[email protected]> * moved manager to a top-level package Signed-off-by: Daniel Rammer <[email protected]> * added shard strategy to manager configuration Signed-off-by: Daniel Rammer <[email protected]> * setting shard key label selector on managed propeller instances Signed-off-by: Daniel Rammer <[email protected]> * fixed random lint issues Signed-off-by: Daniel Rammer <[email protected]> * split pod name generate to separate function to ease future auto-scaler implementation Signed-off-by: Daniel Rammer <[email protected]> * cleaned up pod label selector Signed-off-by: Daniel Rammer <[email protected]> * delete pods on shutdown Signed-off-by: Daniel Rammer <[email protected]> * added prometheus metric reporting Signed-off-by: Daniel Rammer <[email protected]> * updated manager run loop to use k8s wait.UntilWithContext Signed-off-by: Daniel Rammer <[email protected]> * moved getKubeConfig into a shared package Signed-off-by: Daniel Rammer <[email protected]> * assigning shard and namespace labels on FlyteWorkflow Signed-off-by: Daniel Rammer <[email protected]> * implement NamespaceShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * implemented NamespaceShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * fixed shard label Signed-off-by: Daniel Rammer <[email protected]> * added comments Signed-off-by: Daniel Rammer <[email protected]> * checking for existing pods on startup Signed-off-by: Daniel Rammer <[email protected]> * handling delete of non-existent pod Signed-off-by: Daniel Rammer <[email protected]> * changes ConsistentHashing name to Random - because that's what it really is Signed-off-by: Daniel Rammer <[email protected]> * implemented EnableUncoveredReplica configuration option Signed-off-by: Daniel Rammer <[email protected]> * added leader election to manager using existing propeller config Signed-off-by: Daniel Rammer <[email protected]> * fixed disable leader election in managed propeller pods Signed-off-by: Daniel Rammer <[email protected]> * removed listPods function Signed-off-by: Daniel Rammer <[email protected]> * added leader election to mitigate concurrent modification issues Signed-off-by: Daniel Rammer <[email protected]> * enabled pprof to profile resource metrics Signed-off-by: Daniel Rammer <[email protected]> * added 'manager' target to Makefile to start manager in development mode (similar to existing server) Signed-off-by: Daniel Rammer <[email protected]> * added shard strategy test for computing key ranges Signed-off-by: Daniel Rammer <[email protected]> * fixed key range computation Signed-off-by: Daniel Rammer <[email protected]> * implemented project and domain shard types Signed-off-by: Daniel Rammer <[email protected]> * returning error on out of range podIndex during UpdatePodSpec call on shard strategy Signed-off-by: Daniel Rammer <[email protected]> * fixed random lint issues Signed-off-by: Daniel Rammer <[email protected]> * added manager tests Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * added doc comments on exported types and functions Signed-off-by: Daniel Rammer <[email protected]> * exporting ComputeKeyRange function and changed adding addLabelSelector function name to addLabelSelectorIfExists to better reflect functionality Signed-off-by: Daniel Rammer <[email protected]> * adding pod template resource version and shard config hash annotations to fuel automatic pod management on updates Signed-off-by: Daniel Rammer <[email protected]> * removed pod deletion on manager shutdown Signed-off-by: Daniel Rammer <[email protected]> * cleaned up unit tests and lint Signed-off-by: Daniel Rammer <[email protected]> * updated getContainer function to retrive flytepropeller container from pod spec using container name instead of command Signed-off-by: Daniel Rammer <[email protected]> * removed addLabelSelectorIfExists function call Signed-off-by: Daniel Rammer <[email protected]> * changed bytes.Buffer from a var to declaring with new Signed-off-by: Daniel Rammer <[email protected]> * created a new shardstrategy package Signed-off-by: Daniel Rammer <[email protected]> * generating mocks for ShardStrategy to decouple manager package tests from shardstrategy package tests Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * changed shard configuration defintions and added support for wildcard id in EnvironmentShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * updated documentation Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * setting managed pod owner references Signed-off-by: Daniel Rammer <[email protected]> * updated documentation Signed-off-by: Daniel Rammer <[email protected]> * fixed a few nits Signed-off-by: Daniel Rammer <[email protected]> * delete pods with failed state Signed-off-by: Daniel Rammer <[email protected]> * changed ShardType type to int instead of string Signed-off-by: Daniel Rammer <[email protected]> * removed default values in manager config Signed-off-by: Daniel Rammer <[email protected]> * updated config_flags with pflags generation Signed-off-by: Daniel Rammer <[email protected]>
* Scale out with propeller manager and workflow sharding (flyteorg#351) * added 'manager' command Signed-off-by: Daniel Rammer <[email protected]> * using go routine and timer for manager loop Signed-off-by: Daniel Rammer <[email protected]> * moved manager loop out of cmd and into pkg directory Signed-off-by: Daniel Rammer <[email protected]> * detecting missing replicas Signed-off-by: Daniel Rammer <[email protected]> * moved extracting replica from pod name to new function Signed-off-by: Daniel Rammer <[email protected]> * creating managed flytepropeller pods Signed-off-by: Daniel Rammer <[email protected]> * refactored configuration Signed-off-by: Daniel Rammer <[email protected]> * removed regex parsing for replica - checking for existance with fully qualified pod name Signed-off-by: Daniel Rammer <[email protected]> * mocked out shard strategy abstraction Signed-off-by: Daniel Rammer <[email protected]> * adding arguments to podspec for ConsistentHashingShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * updated import naming Signed-off-by: Daniel Rammer <[email protected]> * moved manager to a top-level package Signed-off-by: Daniel Rammer <[email protected]> * added shard strategy to manager configuration Signed-off-by: Daniel Rammer <[email protected]> * setting shard key label selector on managed propeller instances Signed-off-by: Daniel Rammer <[email protected]> * fixed random lint issues Signed-off-by: Daniel Rammer <[email protected]> * split pod name generate to separate function to ease future auto-scaler implementation Signed-off-by: Daniel Rammer <[email protected]> * cleaned up pod label selector Signed-off-by: Daniel Rammer <[email protected]> * delete pods on shutdown Signed-off-by: Daniel Rammer <[email protected]> * added prometheus metric reporting Signed-off-by: Daniel Rammer <[email protected]> * updated manager run loop to use k8s wait.UntilWithContext Signed-off-by: Daniel Rammer <[email protected]> * moved getKubeConfig into a shared package Signed-off-by: Daniel Rammer <[email protected]> * assigning shard and namespace labels on FlyteWorkflow Signed-off-by: Daniel Rammer <[email protected]> * implement NamespaceShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * implemented NamespaceShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * fixed shard label Signed-off-by: Daniel Rammer <[email protected]> * added comments Signed-off-by: Daniel Rammer <[email protected]> * checking for existing pods on startup Signed-off-by: Daniel Rammer <[email protected]> * handling delete of non-existent pod Signed-off-by: Daniel Rammer <[email protected]> * changes ConsistentHashing name to Random - because that's what it really is Signed-off-by: Daniel Rammer <[email protected]> * implemented EnableUncoveredReplica configuration option Signed-off-by: Daniel Rammer <[email protected]> * added leader election to manager using existing propeller config Signed-off-by: Daniel Rammer <[email protected]> * fixed disable leader election in managed propeller pods Signed-off-by: Daniel Rammer <[email protected]> * removed listPods function Signed-off-by: Daniel Rammer <[email protected]> * added leader election to mitigate concurrent modification issues Signed-off-by: Daniel Rammer <[email protected]> * enabled pprof to profile resource metrics Signed-off-by: Daniel Rammer <[email protected]> * added 'manager' target to Makefile to start manager in development mode (similar to existing server) Signed-off-by: Daniel Rammer <[email protected]> * added shard strategy test for computing key ranges Signed-off-by: Daniel Rammer <[email protected]> * fixed key range computation Signed-off-by: Daniel Rammer <[email protected]> * implemented project and domain shard types Signed-off-by: Daniel Rammer <[email protected]> * returning error on out of range podIndex during UpdatePodSpec call on shard strategy Signed-off-by: Daniel Rammer <[email protected]> * fixed random lint issues Signed-off-by: Daniel Rammer <[email protected]> * added manager tests Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * added doc comments on exported types and functions Signed-off-by: Daniel Rammer <[email protected]> * exporting ComputeKeyRange function and changed adding addLabelSelector function name to addLabelSelectorIfExists to better reflect functionality Signed-off-by: Daniel Rammer <[email protected]> * adding pod template resource version and shard config hash annotations to fuel automatic pod management on updates Signed-off-by: Daniel Rammer <[email protected]> * removed pod deletion on manager shutdown Signed-off-by: Daniel Rammer <[email protected]> * cleaned up unit tests and lint Signed-off-by: Daniel Rammer <[email protected]> * updated getContainer function to retrive flytepropeller container from pod spec using container name instead of command Signed-off-by: Daniel Rammer <[email protected]> * removed addLabelSelectorIfExists function call Signed-off-by: Daniel Rammer <[email protected]> * changed bytes.Buffer from a var to declaring with new Signed-off-by: Daniel Rammer <[email protected]> * created a new shardstrategy package Signed-off-by: Daniel Rammer <[email protected]> * generating mocks for ShardStrategy to decouple manager package tests from shardstrategy package tests Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * changed shard configuration defintions and added support for wildcard id in EnvironmentShardStrategy Signed-off-by: Daniel Rammer <[email protected]> * updated documentation Signed-off-by: Daniel Rammer <[email protected]> * fixed lint issues Signed-off-by: Daniel Rammer <[email protected]> * setting managed pod owner references Signed-off-by: Daniel Rammer <[email protected]> * updated documentation Signed-off-by: Daniel Rammer <[email protected]> * fixed a few nits Signed-off-by: Daniel Rammer <[email protected]> * delete pods with failed state Signed-off-by: Daniel Rammer <[email protected]> * changed ShardType type to int instead of string Signed-off-by: Daniel Rammer <[email protected]> * removed default values in manager config Signed-off-by: Daniel Rammer <[email protected]> * updated config_flags with pflags generation Signed-off-by: Daniel Rammer <[email protected]> Signed-off-by: Haytham Abuelfutuh <[email protected]> * Create codeql-analysis.yml Signed-off-by: Haytham Abuelfutuh <[email protected]> * Handle code quality issue Signed-off-by: Haytham Abuelfutuh <[email protected]> * check boundaries Signed-off-by: Haytham Abuelfutuh <[email protected]> * 0 is ok Signed-off-by: Haytham Abuelfutuh <[email protected]> * Use ParseUint instead Signed-off-by: Haytham Abuelfutuh <[email protected]> * bump for DCO Signed-off-by: Haytham Abuelfutuh <[email protected]> Co-authored-by: Dan Rammer <[email protected]>
TL;DR
Adding a FlytePropeller Manager component which is responsible for configuring and ensuring liveness over a collection of FlytePropeller instances. FlyteWorkflow CRDs are effectively sharded over these instances, so that there is a deterministic one-to-one relationship. this enables horizontally scaling FlytePropeller.
Type
Are all requirements met?
Complete description
Functionality has been implemented in accordance to the accepted RFC.
Tracking Issue
flyteorg/flyte#125
Follow-up issue
NA