Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs are all paused; improve module loading issue #34

Closed
5 of 6 tasks
rivernews opened this issue Mar 7, 2020 · 3 comments
Closed
5 of 6 tasks

Jobs are all paused; improve module loading issue #34

rivernews opened this issue Mar 7, 2020 · 3 comments

Comments

@rivernews
Copy link
Owner

rivernews commented Mar 7, 2020

We observed some code are run multiple times when starting the server, but wasn't able to figure out why, even for objects that intentionally written as a singleton, the duplicate loading issue still happens. This also potentially leads to npm test not able to terminate w/o --exit intervention.

Luckily, we saw this SO post about singleton, but also mentions avoiding module being loaded multiple time. The answer refers to a nodejs document, where it states that a mixture of commonJs and ES module import will cause issues and breaks the "module caching".

Indeed, when we look at several process.ts file, we surprisingly found we still have const Bull = requires('bull'). This could really be the issue described in the document.

After fixing the issue, we should test locally, and at the end, in production, that indeed it's not duplicating module loading - by either printing console.log and verify only single lines are printed.

Then, we should deal with the job pause problem. Looks like previously when shutdown, the .pause() makes the job queue in redis marked as paused, and this states carries over to the new express server. I guess there's some persistency Bull built-in using redis key-value store to achieve such state persistency. So the new server now begins with all queues in paused state.

How to deal with the paused queues issue? The simplest, cleanest way, is to flushdb - this shall clear all the state -- but will clean all the job history as well. But since we've finished looking at previously failed s3 org job & no further action to take, those job history is not necessary. Other approach will be to deploy a server with a .resume() clause in startJobQueue() initializer, but then we'll need to wait for hours for all jobs to finish. It's really a pain in the ass that we cannot abort the jobs. But I guess we can destroy the container -- either through cli or the kubernetes dashboard.

Conclusion

  • Unify all module importing to ES. No more commonJs.
  • Remove the .pause(). And in dev we're already flushing db so shouldn't matter much.
  • Test module loading only once - remove the mocha test --exit, run npm test, see if there's any duplicated console log, see if it can gracefully terminated.
  • If above still doesn't work -- still loading multiple times, then we need a stricter initializing process. First of all, avoid when in test the queue being newed. We should have a separate function initializer that will new the queue. If that initializer func is not called, no queue is newed. This should be able to avoid necessary job queues that has to be finalized and closed, and redis connections.
  • Once the local npm test can be terminated gracefully without --exit, get ready for production. Flush production db.
  • Deploy to production, wait the deployment to finish. When finish, try to trigger a supervisor job rrr, just use a small amount review org. If everything looks good, do a ccc.
@rivernews
Copy link
Owner Author

rivernews commented Mar 7, 2020

Got error supervisor job queue not initialized error. However we see initialize() did run, then the s3 controller fired as we gave command in slack. Not sure why the initialize() did not setup .queue. But we have some options to debug

  • Run .initialize() in controller if queue is undefined.
  • Test in local dev first.
  • To test further, we can even write a set/get function for JobQueueManager.queue, then have the original member queue to be private. Then we can debug further their, see why previous running of initialize() - why doesn't it take effect.
  • Also search SO etc, why the newed class + run initializer() is not working.

@rivernews
Copy link
Owner Author

rivernews commented Mar 7, 2020

Sandbox process caveat

A big, perhaps even surprising insight is that sandbox process will create a completely separate process from the master one, thus not sharing any object, references w/ the master process, unless using nodejs process messaging (which Bull is doing under the hood to achieve some functionalities). Thus this Bull issue thread suggest that we re-create the queue instance in sandbox process -- even if we already create the queue in master process. To conclude, the timing to do such re-create in sandbox process is when we need to either access a queue event or dispatch job of a queue in sandbox process, we will need to re-construct that queue instance.

Some questions immediately emerges -- if we are re-creating queue instance, will that duplicate redis clients and create unnecessary redis connections, even if in concept the number of unique queues is fixed. We sum up the questions here:

  1. Is doing another .initialize() (newing a queue under the hood) for queue in sandbox process the way to go? We see that this Bull issue is also dispatching job in a sandbox process, which will need a queue instance to do so. The person use module.exports = new Queue(). I believe this is same as what we are doing, because sandbox process will need to import queue module again since it's not sharing module resources with master process, and thus the new Queue() is run again in each sandbox process.
  2. If re-constructing queue is necessary for sandbox process, will there be redis client overhead? Will Bull create another set of redis client and connection, even if we are using createClient() in queue option? Based on this issue comment looks like it will - it'll create another set of redis client connection for re-creating the queue.
    • Based on experiment, redis client for queue is indeed created again in sandbox process -- after all, the sandbox process needs to do all the required code to instantiate queue same as in master process. But when we look at Bull dashboard, the client connection count doesn't seem to surge or increase dramatically. Say we see 6 "client created" log in terminal for master process (3 queues instantiated) - the bull dashboard shows 13-15 redis client connection, and 3 blocked clients. Now we re-create queues in sandbox process, so an addition of 4 "client created" log for sandbox process (2 queues instantiated) also shows up. But bull dashboard still shows 15 redis client connections and 3 blocked clients. Does Bull check and reuse redis connection even across different queue instance (but same queue name) in different process? I guess this is the question.
    • After some experiments with redis-cli client list, it indeeds create additional clients. However amount you re-create queue, you will have however amount of additional redis client connection.

Looks like re-creating queues in sandbox process is the way to go. Of course, only re-create those queues that are used in that sandbox processor, other queues you don't need to re-create them.

@rivernews
Copy link
Owner Author

Solved by initializing queues also in sandbox. But this should be done in a minimum - only if that sandbox process really needs that queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant