-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[batching] first pass at fit and finish for production #4915
[batching] first pass at fit and finish for production #4915
Conversation
…#4898) Since we made our images more secure, we run our router process as user 'router'. If we are running under 'heaptrack', e.g.: in a debug image, then we cannot write to /dist/data because it is owned by 'root'. This changes the ownership of /dist/data from 'root' to 'router' to allow writes to succeed.
## 🐛 Fixes ### Security fix: update h2 dependency References: - https://rustsec.org/advisories/RUSTSEC-2024-0332 - https://seanmonstar.com/blog/hyper-http2-continuation-flood/ - https://www.kb.cert.org/vuls/id/421644 The router's performance could be degraded when receiving a flood of HTTP/2 CONTINUATION frames, when the Router is set up to terminate TLS for client connections. By [@Geal](https://github.com/geal)
Follow-up to the v1.43.2 being officially released, bringing version bumps and changelog updates into the `dev` branch.
It can be difficult to understand 'router service call failed' messages. Adding the error detail should make them more comprehensible. fixes: #4899
Fix #4834 This extends the schema aware hashing already employed for subgraph queries in entity caching, to be calculated for client queries, and look at that hash in the query planner cache to be able to reuse cached entries across schema reloads. This contains: - an update of the traverse visitor to use an `ExecutableDocument` (necessary to parse field sets in `key` and `requires` argument - parses field sets in `@join__type`'s `key` argument and `@join__field`'s `requires` argument, because they can be affected by schema updates - remove the hack around `_entities` operation - parse subgraph queries using the subgraph schemas extracted from the supergraph schema - update query planner cache warm up to check if the query hash changed
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [which](https://togithub.com/harryfei/which-rs) | dependencies | major | `5.0.0` -> `6.0.1` | --- ### Release Notes <details> <summary>harryfei/which-rs (which)</summary> ### [`v6.0.1`](https://togithub.com/harryfei/which-rs/blob/HEAD/CHANGELOG.md#601) [Compare Source](https://togithub.com/harryfei/which-rs/compare/6.0.0...6.0.1) - Remove dependency on `once_cell` for Windows users, replace with `std::sync::OnceLock`. ### [`v6.0.0`](https://togithub.com/harryfei/which-rs/blob/HEAD/CHANGELOG.md#600) [Compare Source](https://togithub.com/harryfei/which-rs/compare/5.0.0...6.0.0) - MSRV is now 1.70 - Upgraded all dependencies to latest version </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/apollographql/router). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xNzMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjI2MS4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2In0=--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
This attribute is not yet implemented, we are tracking the work to do that in #4830 However until we actually implement it we should remove from the docs
Fix #4880 The entity cache plugin intended to require a `Cache-Control` header from the subgraph to decide whether or not a response should be cached. Unfortunately in the way tit was set up, all responses were stored. The plugin now makes sure that the `Cache-Control` is there, and if a subgraph does not provide it, then the aggregated `Cache-Control` header sent to the client will contain `no-store`. Additionally, the Router will now check that a TTL is configured for all subgraphs, either in per subgraph configuration, or globally.
A lot of changes mainly geared around improving production stability. In no particular order: - Keep a reference to the shared Batch in each BatchQuery - Introduce SubgraphBatchingError and use in many places - set_query_hashes now returns a Result - so does signal_cancelled - and signal_progress - allow our spawned task to return an error - remove many redundant TODO messages - Don't break loops on error, but continue looping and report all errors - Eliminate all expect/unwrap in non test code - Store the batch size in the Batch (may still remove that...) - assemble_batch() returns a Result - fix tests so they don't require TEST_APOLLO_KEY - Introduce BatchInfo to make map_filter easier to read in process_batches() - Process assemble_batch() errors in process_batches() - Enforce info.len() == txs.len() - Add CacheResolverError::BatchingError (and associated code) Quite a lot of changes...
Fixes #3388 In GraphQL requests, `extensions` is an optional map. Passing an explicit `null` was incorrectly considered a parse error. Now it is equivalent to omiting that field entirely, or to passing an empty map. <!-- start metadata --> --- **Checklist** Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review. - [ ] Changes are compatible[^1] - [ ] Documentation[^2] completed - [ ] Performance impact assessed and acceptable - Tests added and passing[^3] - [ ] Unit Tests - [ ] Integration Tests - [ ] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. --------- Co-authored-by: Bryn Cooke <[email protected]> Co-authored-by: bryn <[email protected]>
CI performance tests
|
Estimates the cost of a query plan by aggregating the costs of operations in the individual fetch nodes. For multiple nodes, the total cost is the sum of the individual costs. For a conditional branch, the cost is the max of the two branches. All deferred nodes count toward the static cost as if they were not deferred. [ROUTER-174](https://apollographql.atlassian.net/browse/ROUTER-174) <!-- start metadata --> --- **Checklist** Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review. - [X] Changes are compatible[^1] - [ ] Documentation[^2] completed - [ ] Performance impact assessed and acceptable - Tests added and passing[^3] - [X] Unit Tests - [ ] Integration Tests - [ ] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. [ROUTER-174]: https://apollographql.atlassian.net/browse/ROUTER-174?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Introduces a third method to `CostCalculator` which scores the cost of a GraphQL response. The first stage of scoring a response is to zip it together with the request. The initial implementation does not use the request fields, but this will be required to support custom cost. In that case, we will need to check the corresponding request field definition to see if a custom cost needs to be applied to a particular response field. This information is stored in a counterpart to `serde_json::Value`, which is called `TypedValue`. This `TypedValue` enum pairs each JSON value with the corresponding `apollo_compiler::executable::Field`. Once the response values are paired with their corresponding fields, it is simple to traverse the JSON structure, taking query or schema directives into account. Implements [ROUTER-175](https://apollographql.atlassian.net/browse/ROUTER-175) **Checklist** Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review. - [X] Changes are compatible[^1] - [ ] Documentation[^2] completed - [ ] Performance impact assessed and acceptable - Tests added and passing[^3] - [X] Unit Tests - [ ] Integration Tests - [ ] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. [ROUTER-175]: https://apollographql.atlassian.net/browse/ROUTER-175?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried that if we have the test keys enforced only when checking the result, then it might fail when running it locally without those keys since the feature requires a key.
Besides that, I left a few nits and suggestions!
spawn_handle: JoinHandle<Result<(), BoxError>>, | ||
|
||
/// What is the size (number of input operations) of the batch? | ||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this to be allowed if nothing is using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now. I'm going to do another pass before starting code review and I'm trying to make my mind up if this is useful (for debug logs) or not. I feel like it might be useful to have it appear in the debug
representation of a Batch
, but rustc
can't work that out.
let (op_name, _context, request, txs) = assemble_batch(requests).await; | ||
let (op_name, _context, request, txs) = assemble_batch(requests) | ||
.await | ||
.expect("it can assemble a batch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expect this to never fail, should assemble_batch
not return a result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. This is a test, so if it does fail, the test will fail and we have an issue to resolve.
})) | ||
.await | ||
.into_iter() | ||
.filter_map(|x: Result<BatchInfo, BoxError>| x.map_err(|e| errors.push(e)).ok()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a super nit, but do we want this lambda to have side-effects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the only way I can think of to handle both errors and oks. Alternative approaches that achieve that are fine by me.
I couldn't think of a good way of enforcing this, so I looked at what file upload does and it, effectively, does the same thing. It's a bit more fiddly here, because we return a If you run
I don't get that behaviour, I just get a bunch of test fails.
|
Detected SAST Vulnerabilities |
A lot of changes mainly geared around improving production stability. In no particular order:
Ref: Ref: #4661