Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor Service #1 (#253) #4010

Merged
merged 2 commits into from
Oct 17, 2024
Merged

Executor Service #1 (#253) #4010

merged 2 commits into from
Oct 17, 2024

Conversation

MustafaI
Copy link
Contributor

@MustafaI MustafaI commented Oct 17, 2024

(cherry picked from commit 35cb59f)

  • Adding ControlPlaneEventsTopic to pulsar config

  • Evolving ControlPlaneEvents message structure

We've decided on a parent/wrapper message for the ControlPlaneEvents to avoid passing around ambiguous proto.Message slices in the Publisher and Ingester.

  • Setting maxAllowedMessageSize to correct value in relevant tests

  • Removing reason for uncordon requests to the executor service

  • Moving event creation time to parent Control Plane Event, modifying executor service rpcs to reflect the events being published, changed pulsar message keys to hard coded strings rather than proto name

  • Renaming UpdateExecutorSettings rpc to UpsertExecutorSettings

  • Removing message keys from ControlPlaneEvent messages, reverting method name changes

  • Renaming LimitEventSequencesByteSize

  • Adding executor cordoning functionality to armadactl

  • Renaming ControlPlaneEvent to Event

  • Simplifying executor cordoning code

  • More sane checks on UpsertExecutorSettings rpc, better error messages

  • Typo

  • Updated command descriptions for executor cordoning and uncordoning

  • Separating executor service args from controlplaneevents

  • Executor Service Initial rough project skeleton #2 (Add Kerberos support #254)

  • Generalising common ingestion pipeline

  • Removing unused config

  • Amending comments and variable names in common ingestion pipeline to be more event agnostic

  • Returning to original metric name, denoting ingested event type via labal rather than metric name

  • Import ordering

  • Generalising pulsar publisher

  • Executor Service Adding circleci config file #3 (Add support for namespaces in load tester. #255)

  • Modifying SchedulerIngester to ingest control plane events, creating executor settings table and associated plumbing

  • Simplifying dbops merge for controlplanevents

  • Moving DBOperation scoping into schedulerdb

  • Adding GetOperation method to DBOperation, determining locking using this

  • Executor Service Setting Go modules to ON in circleci #4 (Fix renewing of cancelled jobs. #257)

  • Implementing cluster cordoning in scheduler

  • Filter executors from previous filter result

  • Adding default value for queue label when publishing controlplaneevent metrics


Fixes #

Special notes for your reviewer:

Summary of changes:

  • Implemented an Executor Service
    • Allows users to Upsert and Delete scheduler executor_settings
  • Generalised the common publisher and ingest pipeline, enabling multiple event models
  • Enabling cordoning and uncordoning of executors though scheduler executor_settings
    • Exposing this functionality through armadactl
  • Implemented a new Control Plane event model

mustafai-gr and others added 2 commits October 17, 2024 16:27
* Move PulsarConfig into common/config (#217) (#3907)

* ARMADA-2848 Move PulsarConfig into commonconfig

* Update test name TestValidateHasJobSetID->Id

* Revert unintended changes to yarn.lock file

* fix import order

Co-authored-by: Eleanor Pratt <[email protected]>

(cherry picked from commit 35cb59f)
Signed-off-by: mustaily891 <[email protected]>

* Adding ControlPlaneEventsTopic to pulsar config

* Evolving ControlPlaneEvents message structure

We've decided on a parent/wrapper message for the ControlPlaneEvents to avoid passing around ambiguous proto.Message slices in the Publisher and Ingester.

* Setting maxAllowedMessageSize to correct value in relevant tests

* Removing reason for uncordon requests to the executor service

* Moving event creation time to parent Control Plane Event, modifying executor service rpcs to reflect the events being published, changed pulsar message keys to hard coded strings rather than proto name

* Renaming UpdateExecutorSettings rpc to UpsertExecutorSettings

* Removing message keys from ControlPlaneEvent messages, reverting method name changes

* Renaming LimitEventSequencesByteSize

* Adding executor cordoning functionality to armadactl

* Renaming ControlPlaneEvent to Event

* Simplifying executor cordoning code

* More sane checks on UpsertExecutorSettings rpc, better error messages

* Typo

* Updated command descriptions for executor cordoning and uncordoning

* Separating executor service args from controlplaneevents

* Executor Service #2 (#254)

* Generalising common ingestion pipeline

* Removing unused config

* Amending comments and variable names in common ingestion pipeline to be more event agnostic

* Returning to original metric name, denoting ingested event type via labal rather than metric name

* Import ordering

* Generalising pulsar publisher

* Executor Service #3 (#255)

* Modifying SchedulerIngester to ingest control plane events, creating executor settings table and associated plumbing

* Simplifying dbops merge for controlplanevents

* Moving DBOperation scoping into schedulerdb

* Adding GetOperation method to DBOperation, determining locking using this

* Executor Service #4 (#257)

* Implementing cluster cordoning in scheduler

* Filter executors from previous filter result

* Adding default value for queue label when publishing controlplaneevent metrics

---------

Signed-off-by: mustaily891 <[email protected]>
Co-authored-by: Eleanor Pratt <[email protected]>
@MustafaI MustafaI merged commit 421dc6d into master Oct 17, 2024
36 checks passed
@MustafaI MustafaI deleted the sendToGitHub/executor-service-3 branch October 17, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants