You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DmlHandler is lazy loaded before running user's indexer code. The problem is part of the class creation involves polling Hasura for credentials for the user's DB. These are used to then initialize a postgres client. The hasura call can fail due to timeout. If this happens, the error is stored inside the promise until the DML Handler is awaited. This happens when context.db is next called and this call is not within a try catch. The result is that the error is not caught. Unhandled exceptions in NodeJS actually kill the thread. Since the DmlHandler.create was written in a try catch but the code execution context leaves that code to continue, the promise is considered unhandled and uncaught, it seems. As a result, the worker thread is killed immediately.
We can fix the issue in two phases:
Phase 1 (Immediate patch)
DmlHandler's create call needs to be reworked so that it always happens AFTER provisioning (Which requires it to stay in runFunctions), but leverages the natural retry mechanism of runFunctions by catching and throwing the error. Ideally, we also want to ensure if DmlHandler functions were not called, that a failed DmlHandler connect doesn't cause a retry when there was no need.
Phase 2 (Integration into Control Plane)
Migrate the Hasura call outside of runFunction call path as part of #426. We can do this because when the executor is started, we can expect Coordinator to let Runner know of provisioning status, and we don't have to do guess work on if Hasura even has user DB credentials or not. DmlHandler likely can also be moved outside of runFunctions as well, as a result.
In addition, moving the Hasura calls outside of runFunctions will improve latency of calls, speeding up backfill performance. Actual impact of migrating the calls out will be known once Instrumentation is completed and released.
The text was updated successfully, but these errors were encountered:
DmlHandler is lazy loaded before running user's indexer code. The problem is part of the class creation involves polling Hasura for credentials for the user's DB. These are used to then initialize a postgres client. The hasura call can fail due to timeout. If this happens, the error is stored inside the promise until the DML Handler is awaited. This happens when context.db is next called and this call is not within a try catch. The result is that the error is not caught. Unhandled exceptions in NodeJS actually kill the thread. Since the DmlHandler.create was written in a try catch but the code execution context leaves that code to continue, the promise is considered unhandled and uncaught, it seems. As a result, the worker thread is killed immediately.
We can fix the issue in two phases:
Phase 1 (Immediate patch)
DmlHandler's create call needs to be reworked so that it always happens AFTER provisioning (Which requires it to stay in runFunctions), but leverages the natural retry mechanism of runFunctions by catching and throwing the error. Ideally, we also want to ensure if DmlHandler functions were not called, that a failed DmlHandler connect doesn't cause a retry when there was no need.
Phase 2 (Integration into Control Plane)
Migrate the Hasura call outside of runFunction call path as part of #426. We can do this because when the executor is started, we can expect Coordinator to let Runner know of provisioning status, and we don't have to do guess work on if Hasura even has user DB credentials or not. DmlHandler likely can also be moved outside of runFunctions as well, as a result.
In addition, moving the Hasura calls outside of runFunctions will improve latency of calls, speeding up backfill performance. Actual impact of migrating the calls out will be known once Instrumentation is completed and released.
The text was updated successfully, but these errors were encountered: