Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop platform: advice to shutdown workers #39

Merged
merged 3 commits into from
Dec 4, 2023

Conversation

reynir
Copy link
Contributor

@reynir reynir commented Nov 20, 2023

When dropping a platform it is best to first disable the relevant workers. Otherwise the workers will eventually recreate the platform.

Fixes #38

When dropping a platform it is best to first disable the relevant
workers. Otherwise the workers will eventually recreate the platform.
app/server.ml Outdated Show resolved Hide resolved
@hannesm
Copy link
Contributor

hannesm commented Nov 21, 2023

So, what are the overall expectations?

  • we have a drop platform command, the builder server receives it.
  • all connected workers with that platform should be disconnected / terminated?
  • any new worker with that platform should immediately be rejected

But, there's no persistence of dropped platforms (should there be some)? So, a restart of the builder-server and a connecting builder-worker will execute such jobs again -- is this fine? do we want to persist the list of dropped platforms? (I think I'm in favour thereof). Also, I'd like the find_job to figure that the platform is dropped (and close the worker connection)..

@reynir
Copy link
Contributor Author

reynir commented Nov 21, 2023

it's not clear to me what a good way forward is. It is not nice to leave workers hanging. Closing the connection or terminating the worker is nice except we usually put our workers in an infinite loop involving installing a lot of packages; so that is not desirable.

Maybe persisting dropped platforms and refusing to drop a platform we have workers connecting thereby forcing a flow where the workers are first shut down. Then we still have the corner case where a worker is "rebooting" while the platform is dropped. Hmm.

@hannesm
Copy link
Contributor

hannesm commented Nov 21, 2023

Oh, another path is server-only: persist the dropped platforms, and just don't ever give such a worker a job. Thus the worker will be there and alive, but not receiving anything. IMHO that would be fine.

@reynir
Copy link
Contributor Author

reynir commented Nov 21, 2023

I added a commit to persist dropped platforms. Then I realized we might want to be able to "undrop" a platform. I have not yet added the command in the client, and I need to bump a version number somewhere. I think the logic is not so easy to follow now.

@hannesm
Copy link
Contributor

hannesm commented Nov 21, 2023

Which logic is not so easy to follow? The list of dropped platforms should also be inspectable (i.e. printed in the info output or such).

@hannesm
Copy link
Contributor

hannesm commented Dec 4, 2023

fine to merge when CI is happy

@hannesm hannesm merged commit d1f85e0 into robur-coop:main Dec 4, 2023
1 check passed
@reynir reynir deleted the drop-platform-help branch December 19, 2023 16:31
hannesm added a commit to hannesm/opam-repository that referenced this pull request Sep 4, 2024
CHANGES:

* improve documentation (robur-coop/builder#37 et al, fixes robur-coop/builder#27)
* adapt to asn1-combinators 0.3.0 API: remove cstruct (robur-coop/builder#49 @hannesm)
* queue up observe messages (robur-coop/builder#48 @reynir)
* use "/job/_/build/_/main-binary" alias - eases bootstrapping (robur-coop/builder#42 @reynir)
* drop platform: advice to shutdown workers (robur-coop/builder#39 @reynir)
* FreeBSD: add builder_worker service script (robur-coop/builder#37 @hannesm)
* client: enumerate valid periods in `--help` (robur-coop/builder#36 @reynir)
* add an interval of "never" to never schedule a job (robur-coop/builder#34 @hannesm, fixes robur-coop/builder#32)
* client: observe omit the UUID (robur-coop/builder#33 @hannesm)
avsm pushed a commit to avsm/opam-repository that referenced this pull request Sep 5, 2024
CHANGES:

* improve documentation (robur-coop/builder#37 et al, fixes robur-coop/builder#27)
* adapt to asn1-combinators 0.3.0 API: remove cstruct (robur-coop/builder#49 @hannesm)
* queue up observe messages (robur-coop/builder#48 @reynir)
* use "/job/_/build/_/main-binary" alias - eases bootstrapping (robur-coop/builder#42 @reynir)
* drop platform: advice to shutdown workers (robur-coop/builder#39 @reynir)
* FreeBSD: add builder_worker service script (robur-coop/builder#37 @hannesm)
* client: enumerate valid periods in `--help` (robur-coop/builder#36 @reynir)
* add an interval of "never" to never schedule a job (robur-coop/builder#34 @hannesm, fixes robur-coop/builder#32)
* client: observe omit the UUID (robur-coop/builder#33 @hannesm)
avsm pushed a commit to avsm/opam-repository that referenced this pull request Sep 5, 2024
CHANGES:

* improve documentation (robur-coop/builder#37 et al, fixes robur-coop/builder#27)
* adapt to asn1-combinators 0.3.0 API: remove cstruct (robur-coop/builder#49 @hannesm)
* queue up observe messages (robur-coop/builder#48 @reynir)
* use "/job/_/build/_/main-binary" alias - eases bootstrapping (robur-coop/builder#42 @reynir)
* drop platform: advice to shutdown workers (robur-coop/builder#39 @reynir)
* FreeBSD: add builder_worker service script (robur-coop/builder#37 @hannesm)
* client: enumerate valid periods in `--help` (robur-coop/builder#36 @reynir)
* add an interval of "never" to never schedule a job (robur-coop/builder#34 @hannesm, fixes robur-coop/builder#32)
* client: observe omit the UUID (robur-coop/builder#33 @hannesm)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

builder-client drop-platform confusion
3 participants