Surface provisioning failures to user #901

morgsmccauley · 2024-07-19T02:00:50Z

With the move to user owned logs tables, we removed the provisioning failure logs - if we fail to provisioning, it may not be possible to log the failure. We need to surface this information somehow, otherwise developers won't know what's gone wrong.

We have two options:

Re-introduce a shared/permanent logs table, specifically for provisioning failures
Log failures to the user owned table

I'm leaning towards 2, but it is tricky given that provisioning of the logs table itself may fail. But, most failures happen within provisioning of user related resources, so we have some confidence that the logs table should always be provisioned unless something is seriously wrong. With the correct alerts and testing (which we have) in place, we should be able to catch real provisioning failures through our own channels, and surface user related failures via there logs table.

This PR updates provisioning such that user-related errors are written to the user-owned logs table, allowing them to debug issues with their schema. Logging to the user table is tricky, since this table is created _during_ the provisioning step itself. Therefore, I have split provisioning in to two phases: 1. System Resources - Setups up system related entities: database, schema, logs table/jobs etc. 2. User Resources - Applies user schema, configures Hasura etc. This separation allows us to isolate the tasks which are likely to fail due to user error, and therefore only surface errors which are relevant. The creation of the logs table _should always succeed_, if it doesn't there is something wrong with the system, i.e. some form of bug has been introduced. Errors thrown during the System portion of provisioning will be error logged to the machine, and I will tune the existing alert so that we are notified of these errors. Additionally, I have converted all non-critical error logs to warnings, so that we don't get alerted on non-issues. closes: #901

morgsmccauley added component: Runner Ungroomed labels Jul 19, 2024

morgsmccauley removed the Ungroomed label Aug 8, 2024

morgsmccauley mentioned this issue Aug 8, 2024

feat: Surface provisioning failures to user #1002

Merged

morgsmccauley closed this as completed in #1002 Aug 9, 2024

morgsmccauley self-assigned this Aug 9, 2024

morgsmccauley mentioned this issue Aug 11, 2024

Log Data Layer provisioning progress/failures #841

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface provisioning failures to user #901

Surface provisioning failures to user #901

morgsmccauley commented Jul 19, 2024

Surface provisioning failures to user #901

Surface provisioning failures to user #901

Comments

morgsmccauley commented Jul 19, 2024