Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health / liveness check for Worker #994

Closed
michaelbromley opened this issue Jul 15, 2021 · 6 comments
Closed

Health / liveness check for Worker #994

michaelbromley opened this issue Jul 15, 2021 · 6 comments
Labels
design 📐 This issue deals with high-level design of a feature @vendure/core

Comments

@michaelbromley
Copy link
Member

Is your feature request related to a problem? Please describe.
Since shifting to the standalone NestJS architecture for the worker in v1.0, we've lost the ability to use the HealthCheckRegistryService to monitor the health of worker processes.

There is also no easy way to do a liveness check for things like kubernetes deployments of the worker process.

Since the worker no longer has a networking layer, we cannot just add an HTTP endpoint, which is the usual way of handling it.

Describe the solution you'd like
Some way to support both the internal HealthCheckRegistryService checks (so it appears in the Admin UI) and also support for a generic liveness check.

Describe alternatives you've considered
Some ideas:

  • Add a http layer just for the liveness check. This might imply using a Nest Microservice rather than a Standalone app. What are the perf / other implications of this?
  • Investigate a non-http way of doing a liveness check. Honestly no idea what this might look like, I'm asking on the Nest discord to see if anyone has any idea about this.
@coderjib
Copy link
Contributor

Just to add, as a use case for which we'd benefit of the worker listening to HTTP requests, the need for it to receive a "push" request to perform jobs. This is needed, for example, when running the worker in a serverless environment, where it is not guaranteed to run continuously but must be "awaken" with an HTTP request.
Ref. discussion #968

@tpimh
Copy link

tpimh commented Jul 19, 2021

One non-http way to implement this would be to create a simple job on health status request, not only it would show if the worker is alive or dead, it would also show how busy it is at the moment based on how long it took to complete the job.

@michaelbromley
Copy link
Member Author

Some suggestions from the NestJS discord:


tak1n — Yesterday at 10:30 AM

Guess it should be a full-blown Nest application which is able to serve HTTP traffic.
Have not that much experience with Nest Microservices or Standalone applications so not sure if you can just enable HTTP transport in a standalone/microservice app. Even if its possible I would first see what are the benefits (performance wise - boot time, runtime, memory usage) and whether its even worth doing that (could be premature optimization in the end no?)

BrunnerLivio — Yesterday at 6:47 PM

Yeah I think it makes sense to go with @tak1n solution. Or are there other ways to do health checks in whatever platform you’re using, aside from HTTP? (Either way I’d expose the endpoint via HTTP since that is anyway kinda the standard)
or maybe from the nest application context you could extract the HealthService and all the indicators you need & simply call these services from a e.g. express service?

Something like (await NestFactory.createApplicationContext(AppModule)).get(HealthService)
https://docs.nestjs.com/standalone-applications#getting-started
Though not 100% sure whether that is possible honestly (did not account for such a use-case)

@michaelbromley
Copy link
Member Author

Ran a few performance tests comparing the difference between bootstrapping the worker as a standalone app (the way we currently do it) vs as a full-blown Nest server including http layer. I also created a new WorkerModule which is just the same as the AppModule but omits the ApiModule import, since we would not need that for the worker.

The code looks like this (irrelevant parts elided):

export async function bootstrapWorker(
    userConfig: Partial<VendureConfig>,
): Promise<{ app: INestApplicationContext; startJobQueue: () => Promise<void> }> {
    const mode: 'standalone' | 'server' = 'standalone';
    const module: 'app' | 'worker' = 'worker';
    console.time('bootstrap');

    // ...

    let workerApp: INestApplication | INestApplicationContext;
    let moduleClass: any;
    if (module === 'app') {
        const appModule = await import('./app.module');
        moduleClass = appModule.AppModule;
    } else {
        const workerModule = await import('./worker.module');
        moduleClass = workerModule.WorkerModule;
    }
    if (mode === 'standalone') {
        const appModule = await import('./app.module');
        workerApp = await NestFactory.createApplicationContext(moduleClass, {
            logger: new Logger(),
        });
    } else {
        const workerModule = await import('./worker.module');
        workerApp = await NestFactory.create(moduleClass, {
            logger: new Logger(),
        });
        await (workerApp as any).listen(config.apiOptions.port + 1, config.apiOptions.hostname);
    }
    // ...
    console.log(
        `Mode: ${mode}, module: ${moduleClass.name} -  The script uses approximately ${
            Math.round(used * 100) / 100
        } MB`,
    );
    console.timeEnd('bootstrap');
    return { app: workerApp, startJobQueue };
}

Here are the results (3 runs of each):

standalone with AppModule (status quo)

The script uses approximately 267.68 MB
bootstrap: 2.174s
The script uses approximately 274.61 MB
bootstrap: 1.882s
The script uses approximately 267.47 MB
bootstrap: 1.964s

server with AppModule

The script uses approximately 292.27 MB
bootstrap: 2.466s
The script uses approximately 290.86 MB
bootstrap: 2.474s
The script uses approximately 290.69 MB
bootstrap: 2.486s

standalone with WorkerModule

The script uses approximately 225.57 MB
bootstrap: 552.972ms
The script uses approximately 224.37 MB
bootstrap: 578.833ms
The script uses approximately 224.32 MB
bootstrap: 561.531ms

server with WorkerModule

The script uses approximately 224.01 MB
bootstrap: 418.242ms
The script uses approximately 224.44 MB
bootstrap: 410.18ms
The script uses approximately 224.87 MB
bootstrap: 431.568ms

Conclusion

My intuition that bootstrapping a full server is much heavier than a standalone app is not correct. The main overhead is the ApiModule which, when omitted by using the WorkerModule, has a drastic impact on memory usage and especially startup time.

So it looks perfectly viable to bootstrap the worker as a full NestJS server app, which will give us an HTTP layer we can use for the health check and at the same time reduce startup time by a factor of ~4-5x, (2s down to 400ms) which is also very helpful for the serverless use-case!

API proposal

When bootstrapping the worker as a Nest server, we need to provide a post & hostname. I propose adding an optional 2nd argument to the bootstrapWorker() function which allows these to be provided. If provided, a server is created. If not provided, a standalone app is created. Both server and standalone app return an object implementing INestApplicationContext, so nothing needs to change regarding the return type of the function.

Health Checks

Still to be solved is how to enable a health check indicator for the worker in the Admin UI using the existing HealthCheckRegistryService. Problems:

  1. How do we tell the server at which address to ping the worker?
  2. There can be multiple workers. They can come on and offline during the lifecycle of the server. How do we communicate this to the server so that it can add/remove checks for each worker?

Perhaps we can have an endpoint on the server than can be called to register a worker? Need to think more about this one.

michaelbromley added a commit that referenced this issue Jul 29, 2021
@coderjib
Copy link
Contributor

Thanks for this improvement, just wanted to understand if this new feature is usable also to let the worker listen to incoming (REST) push requests regarding the tasks to perform, as for the use case I mentioned in my comment above

@michaelbromley
Copy link
Member Author

@artcoded-dev This change will allow the worker to listen for HTTP requests like this:

bootstrapWorker(config)
  .then(worker => worker.startJobQueue())
  .then(worker => worker.startHealthCheckServer({ port: 3020 }))
  .catch(err => {
    console.log(err);
  });
GET http://<host>:3020

{ status: 'ok' }

If your use-case requires some other kind of http endpoint, please provide more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design 📐 This issue deals with high-level design of a feature @vendure/core
Projects
None yet
Development

No branches or pull requests

3 participants