-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research running queries in parallel across cores #8400
Comments
@m-allanson @pieh @DSchau @rase- and I met to investigate this issue this morning. We identified some low-hanging fruit @DSchau will work on today: #7373 (comment) Next step is to investigate why graphql query running slows dramatically for certain sites e.g. https://github.com/herotc/bfa.herodamage.com. There's probably some easy ways to speed them up. Then next, we discussed your research @Moocar and it seems that before we can go multi-process, we need to figure out a shared cache strategy as otherwise, copying data around + the duplication of work + caches would eat up most of the gains from going multi-process. |
Good stuff. I agree that a shared cache strategy is needed. There's so much ad hoc caching going on in Gatsby. I'm currently researching what would be required to use an in-memory DB. I think there will be significant gains there. |
Might my issue here somehow be related #8330 ? Cause the error is mentioning jest-worker or is it used elsewhere? |
@CanRau this is just research for now. I haven't written any code around this yet so it won't be related to your issue. |
yeah that's what I meant in my last line^^ but it still sounded related^^ |
Old issues will be closed after 30 days of inactivity. This issue has been quiet for 20 days and is being marked as stale. Reply here or add the label "not stale" to keep this issue open! |
Summary
Suggestion for speeding up query execution using multiple cores.
The Problem(s)
The Query execution phase of a Gatsby build is often the slowest for a variety of reasons:
gatsby-source-filesystem
)publicURL()
resolver)Solution
Run queries across multiple cores. Queries do not depend on each other, so there is potential to run them in parallel. In a sense, we already do this by using
better-queues's
concurrency option. But this only achieves async concurrency. We're still locked to a single core because node.js is single threaded. But libraries likejest-worker
make spinning up extra node processes easier.The solution would involve creating a number of jest workers before we start consuming from the
query-queue.js
. Each worker would have its own copy of the reduxnodes
namespace and so in theory would have everything it needed to execute queries. Then, the implementation ofquery-runner.js
would be handled by these jest workers who would be given batches of queries to execute. Very similar to how HTML generation works now.Complications
Cached plugin field resolver's values
Fields provided by plugins during
setFieldsOnGraphQLNodeType
must be resolved before the query can be run. Since this would be infeasibly slow to run for every query, aggressive caching is performed. Since we can't share memory, then each worker will have to keep its own cache. Another option would be to use a centralized cache in the main process, or in something like redis. The three caches used inrun-sift.js
are:resolvedNodesCache
- All fully resolved nodes. key = (type, nodes.length, query filter args)enhancedNodeCache
:: A single fully resolved (recursively finished) node. key = (node, query filter args)enhancedNodesPromiseCache
:: A current unfinished promise that's still executing (since it's not in enhancedNodeCache yet). key = (node, query filter args)Increased Memory Footprint
For large sites, the
nodes
redux namespace is the largest. Each time we start a worker, we have to copy it to the new process. We could very quickly run out of memory. And it will only get worse the bigger the site is, which annoyingly is the exact use case we're trying optimize for.This is amplified by the caching problem above. If we have to start maintaining multiple copies of caches too, memory usage will only increase.
There are many possible solutions to this. One would be to use heuristics to determine how many workers to create. For a site that has many queries, but not too many nodes, then multiple workers will work great. Whereas a site with a huge nodes namespace but fewer queries won't see a major increase in performance and will run out of memory quicker. We may be able to use these metrics to figure out when to run queries in parallel.
Resolvers record page dependencies
Almost every Graphql resolver records a page dependency if a result is found. This results in an action being reduced into redux
componentDataDependencies
namespace. If query workers are in separate processes, they won't have access to that redux store. Some solutions might be:1. Return dependencies that need to be recorded with result
Rather than recording the dependency within the resolver gqlType, create a data structure that is somehow returned with the query result. Then in the main process, we can iterate through that list and save the new dependencies. I'm not sure how we would communicate that datastructure up through all the layers of graphql. We could also expose a queue or datastructure in the context that the resolvers can directly edit directly.
2. Figure out how to move dependency recording out of resolvers
Every resolver must explicitly record a page dependency if a result is found. This occurs all throughout the code base. And plugins authors must remember to do this too. Instead, there might be a way to record the dependency by iterating through the query result (no idea if it's possible).
Node Tracking
In
run-sift.js
, we resolve all query args (to realize custom plugin fields). Whenever a new field value is resolved, we track its dependency back to its root source node. This root node tracking is stored in thenode-tracking.js:rootNodeMap
global var. Out of process workers will not have access to this var.A solution to this would be similar to what we figure out above for recording page dependencies.
Sharp Processing is still high cost
The
sharp
plugin resolver is one of the heaviest users of CPU and is already optimized to use all cores. So sites whose query phase is mostly taken up by image processing wouldn't see a noticeable increase from parallel query workers.Global state in custom plugin fields
Plugins that perform heavy processing (e.g sharp, squip, markdown) have their own global state. Most are local caches and in memory queues. These caches will be built from scratch in each worker which is not ideal.
This becomes a problem if the same node field gets resolved over and over again. Firstly, the caches on each worker will contain copies of the same data, leading to increased memory. But the bigger issue is that cached promises won't be shared. Imagine a plugin's field resolver takes 2s to execute. And 4 queries reference that same node. If all 4 of those queries execute at the same time and are split across 4 workers, then each core will repeat the work. Whereas right now, they can reference the same shared memory to check if an operation is already executing.
custom plugin fields can call actions
transformer-sharp calls
actions.createJob()
which updates redux to track a job. This won't work if called from a separate process. Any custom plugin field resolver can call arbitrary functions.One possible solution here is to change how actions are bound so that their datastructures are sent back to the central process via Inter Process Communication (IPC) and there the actions take over. But we'd have to make sure that transactionability was still maintained.
The text was updated successfully, but these errors were encountered: