You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's always going to be a delay between us detecting that the db has entered maintenance mode and bringing down all relevant jobs. This leaves the possibility of spurious errors (or even, possibly, spurious results) from jobs which run during this window.
Having not seen this type of error in practice before, we've had two in quite close succession recently:
This job where it errored because the CodedEvent_SNOMED table had disappeared (Slack thread).
And this job where the only plausible explanation was that the population query ran against different data than the final results query.
We've raised with TPP the idea of building in a delay on their end between announcing the start of maintenance mode and actually making changes to the database in order to give us time to gracefully shutdown:
Asked TPP: https://bennettoxford.slack.com/archives/C010SJ89SA3/p1701786382187729
The text was updated successfully, but these errors were encountered:
However, both of these were cohortextractor jobs, which explicitly handled this case by exiting with error code 4, which job-runner reports to the user (and doesn't bubble up to INTERNAL_ERROR)
This is possibly left-over defence in depth from before we had maintenance mode, perhaps? Or
I didn't find any other instances of the 2nd type of failure, which presumably only ehrql would generate anyway.
There's always going to be a delay between us detecting that the db has entered maintenance mode and bringing down all relevant jobs. This leaves the possibility of spurious errors (or even, possibly, spurious results) from jobs which run during this window.
Having not seen this type of error in practice before, we've had two in quite close succession recently:
CodedEvent_SNOMED
table had disappeared (Slack thread).We've raised with TPP the idea of building in a delay on their end between announcing the start of maintenance mode and actually making changes to the database in order to give us time to gracefully shutdown:
Asked TPP:
https://bennettoxford.slack.com/archives/C010SJ89SA3/p1701786382187729
The text was updated successfully, but these errors were encountered: