feat: enable draining csb via os signal handling #1096

nouseforaname · 2024-09-10T08:46:48Z

Checklist:

Have you added or updated tests to validate the changed functionality?
Have you added Release Notes in the docs repositories?
Have you followed the Conventional Commits specification?

When running as an app in CF we can rely on the platform to handle TLS setup, but on a VM currently there is no way to have encrypted traffic. TPCF-26820

it is angry about `.` imports. But we do not mind because this is not production code.

when a csb app that is running in cf is stopped outside of it's own lifecycle ( e.g. the diego cell is redeployed) we do not have a great way of ensuring that all in flight terraform executions will be able to finish their work and write back the resulting tf state to the csb DB. Diego assumes that an app will gracefully shutdown within 10s of receiving SIGTERM, if that is not the case, the App will receive a SIGKILL and stop abruptly That creates orphaned resources in the underlying IaaS that cannot be cleaned up by the csb because the CSB does not have the tfstate for the terraform resources that were in flight when the CSB got shutdown. To aleviate this issue, this introduces a graceful shutdown sequence and and lockfiles on disk ( to be consumed by a drain script ). This enables to deploy the CSB as a workload on a bosh instance. Instead of marking specific SI instances as failed, this ensures that the broker will - stop accepting new requests - finish all in flight TF before shutdown. The drain script can be kept simple by inspecting a folder. If that folder is empty, it is safe to proceed to stop the CSB. We also tried a drain script based on inspecting the processes running ( e.g. if a tofu or provider binary is still being executed ). Though that seems potentially unreliable ( since there could be time of check // time of use issues ) that falsely suggest that everything is finished ( e.g. because we checked right between two invocations of the provider / tofu binaries ) - fly-by: some structs got their fiels reordered to improve their memory footprint.

* extended inflight operation test to check deprovision * removed focus test * removed failing check for SIGTERM, in this case SIGKILL is sent - but not seen in log

A previous commit fixed an issue with LockFilesExist returning an inverted value. The existing drain wait code depended on this incorrect behaviour.

I O U a proper message

ifindlay-cci and others added 7 commits September 3, 2024 14:02

feat: added tls support to cloud service broker app

290446f

When running as an app in CF we can rely on the platform to handle TLS setup, but on a VM currently there is no way to have encrypted traffic. TPCF-26820

ignore linting for test helper

a4f7e83

it is angry about `.` imports. But we do not mind because this is not production code.

feat: cleaned up lock files before each test run

ff7aa6e

* extended inflight operation test to check deprovision * removed focus test * removed failing check for SIGTERM, in this case SIGKILL is sent - but not seen in log

ifeat: added unittests for LockFilesExists

75c1ca1

bug: drain wait not working as expected

cc63582

A previous commit fixed an issue with LockFilesExist returning an inverted value. The existing drain wait code depended on this incorrect behaviour.

we should block in shutdown after shutdown was called

7d80274

I O U a proper message

nouseforaname marked this pull request as draft September 10, 2024 09:02

FelisiaM changed the title ~~Feat: enable draining csb via os signal handling~~ feat: enable draining csb via os signal handling Sep 10, 2024

nouseforaname marked this pull request as ready for review September 10, 2024 09:39

FelisiaM added 2 commits September 10, 2024 10:57

Tidy up fakes and focused test

93d1e18

Merge branch 'main' into feat_enable_os_signal_handling

0fb301b

nouseforaname closed this Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable draining csb via os signal handling #1096

feat: enable draining csb via os signal handling #1096

nouseforaname commented Sep 10, 2024

feat: enable draining csb via os signal handling #1096

feat: enable draining csb via os signal handling #1096

Conversation

nouseforaname commented Sep 10, 2024

Checklist: