context id is not preserved from router to supergraph stage in a batch #4019

garypen · 2023-10-11T13:33:04Z

When we create a batch, we create a new Context for each Supergraph that we generate from the Router. This means that the ID will be different for Supergraph stages to at the Router stage.

It's difficult to decide if this is the right thing to do or if we should clone the id from the Router onto the Supergraph. That also has some nice properties.

The text was updated successfully, but these errors were encountered:

garypen · 2024-04-08T09:40:50Z

Following careful review of this decision, within the context of #4661 , we've decided to continue with the current solution (creating a new Context for each SupergraphRequest that we generate from the batched RouterRequest).

In the future, this mechanism could be extended so that we preserve a reference to the RouterRequest context, or at least parts of that context (e.g.: context id). That work is not being performed as part of the current batching solution.

This project is an extension of the existing work to support [client side batching in the router](#126). The current implementation is experimental and is publicly [documented](https://www.apollographql.com/docs/router/executing-operations/query-batching/). The additional work to enable batching requests to subgraphs is captured in [this issue](#2002). Currently the concept of a batch is preserved until the end of the RouterRequest processing. At this point we convert each batch request item into a separate SupergraphRequest. These are then planned and executed concurrently within the router and re-assembled into a batch when they complete. It's important to note that, with this implementation, the concept of a batch, from the perspective of an executing router, now disappears and each request is planned and executed separately. This extension will modify the router so that the concept of a batch is preserved, at least outwardly, so that multiple subgraph requests are "batched" (in exactly the same format as a client batch request) for onward transmission to subgraphs. The goal of this work is to provide an optimisation by reducing the number of round-trips to a subgraph from the router. Additionally, the work will address an [unresolved issue](#4019) from the existing experimental implementation and promote the existing implementation from experimental to fully supported. Fixes #2002  --- **Review Guidance** This is a fairly big PR, so I've written these notes to help make the review more approachable. 1. The most important files to review are (in order): - [.changesets/feat_garypen_2002_subgraph_batching.md](https://github.com/apollographql/router/pull/4661/files#diff-6376c91cfdd47332a662c760ac849bb5449a1b6df6891b30b72b43f041bd836f) - [docs/source/executing-operations/query-batching.mdx](https://github.com/apollographql/router/pull/4661/files#diff-617468db3057857f71c387eaa0d1a6161e3c1b8bf9fcb2de6fc6eafedc147277) - [apollo-router/src/services/router/service.rs](https://github.com/apollographql/router/pull/4661/files#diff-544579a213fda1bff6313834d30fe1746a8a28ffd7c0d6dfa1081fa36a487355) - [apollo-router/src/services/supergraph/service.rs](https://github.com/apollographql/router/pull/4661/files#diff-5d72a88a68962a5926fb5bb115ea3efc186904612f74e697d72e3f009669c733) - [apollo-router/src/query_planner/plan.rs](https://github.com/apollographql/router/pull/4661/files#diff-21a82d277d12e8f21b6b71398d62e95303a117130cc4a27510b85ebfceeb8208) - [apollo-router/src/services/subgraph_service.rs](https://github.com/apollographql/router/pull/4661/files#diff-6ef5a208ca8622f30eef88f75c18566e0304d59856b66293dcd6811555e6382e) - [apollo-router/src/batching.rs](https://github.com/apollographql/router/pull/4661/files#diff-3e884074ecad8176341159a2382aa81c49d74b851894b8ade9fa4718c434dec6) First read the documentation. Hopefully that will make clear how this feature works. I've picked these files as being most important (and ordered them for review) because: router service => This is where we spot incoming batches and create context `BatchQuery` items to manage them through the router. We also re-assemble them on the way back and identify any batches which may need to be cancelled. supergraph service => Here we pick up the information about how many fetches we believe each `BatchQuery` will need to make. plan => The new `query_hashes()` does this fetch identification for us. This is the most important function in this feature. subgraph service => Here's is where we intercept the calls to subgraphs and park threads to wait for batch execution to be performed. We do a lot of work here, so this is where most of the intrusive changes are: assembling and dis-assembling batches and managing the co-ordination between a number of parked tasks. batching => This is the implementation of batch co-ordination. Each batch has a task which manages a variety of channels to facilitate communication between the incoming batches, waiting tasks and outgoing (to subgraph) batches. I'm suggesting reading this *after* reading through the service changes because it should mainly just be implementation details and you will be able to follow what is happening without knowing all this detail initially. Once you understand the changes to the services, you will need to read this code. Feel free to peek ahead though if that's how you like to review stuff. 2. There are still a couple of TODOs which will be resolved early next week. They are both related to how we handle context cloning, so a decision is still pending there. Obviously all the files need to be reviewed, but the remaining files should be fairly mechanical/straight-forward. --- **Checklist** Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review. - [x] Changes are compatible[^1] - [x] Documentation[^2] completed - [x] Performance impact assessed and acceptable - Tests added and passing[^3] - [x] Unit Tests - [x] Integration Tests - ~[ ] Manual Tests~ **Exceptions** *Note any exceptions here* **Notes** [^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. --------- Co-authored-by: Nicholas Cioli <[email protected]> Co-authored-by: Edward Huang <[email protected]>

garypen self-assigned this Jan 31, 2024

garypen mentioned this issue Feb 15, 2024

Subgraph support for query batching #4661

Merged

5 tasks

garypen closed this as completed Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context id is not preserved from router to supergraph stage in a batch #4019

context id is not preserved from router to supergraph stage in a batch #4019

garypen commented Oct 11, 2023

garypen commented Apr 8, 2024

context id is not preserved from router to supergraph stage in a batch #4019

context id is not preserved from router to supergraph stage in a batch #4019

Comments

garypen commented Oct 11, 2023

garypen commented Apr 8, 2024