You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the move to user owned logs tables, we removed the provisioning failure logs - if we fail to provisioning, it may not be possible to log the failure. We need to surface this information somehow, otherwise developers won't know what's gone wrong.
We have two options:
Re-introduce a shared/permanent logs table, specifically for provisioning failures
Log failures to the user owned table
I'm leaning towards 2, but it is tricky given that provisioning of the logs table itself may fail. But, most failures happen within provisioning of user related resources, so we have some confidence that the logs table should always be provisioned unless something is seriously wrong. With the correct alerts and testing (which we have) in place, we should be able to catch real provisioning failures through our own channels, and surface user related failures via there logs table.
The text was updated successfully, but these errors were encountered:
This PR updates provisioning such that user-related errors are written
to the user-owned logs table, allowing them to debug issues with their
schema.
Logging to the user table is tricky, since this table is created
_during_ the provisioning step itself. Therefore, I have split
provisioning in to two phases:
1. System Resources - Setups up system related entities: database,
schema, logs table/jobs etc.
2. User Resources - Applies user schema, configures Hasura etc.
This separation allows us to isolate the tasks which are likely to fail
due to user error, and therefore only surface errors which are relevant.
The creation of the logs table _should always succeed_, if it doesn't
there is something wrong with the system, i.e. some form of bug has been
introduced. Errors thrown during the System portion of provisioning will
be error logged to the machine, and I will tune the existing alert so
that we are notified of these errors.
Additionally, I have converted all non-critical error logs to warnings,
so that we don't get alerted on non-issues.
closes: #901
With the move to user owned logs tables, we removed the provisioning failure logs - if we fail to provisioning, it may not be possible to log the failure. We need to surface this information somehow, otherwise developers won't know what's gone wrong.
We have two options:
I'm leaning towards 2, but it is tricky given that provisioning of the logs table itself may fail. But, most failures happen within provisioning of user related resources, so we have some confidence that the logs table should always be provisioned unless something is seriously wrong. With the correct alerts and testing (which we have) in place, we should be able to catch real provisioning failures through our own channels, and surface user related failures via there logs table.
The text was updated successfully, but these errors were encountered: