Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batching cancellations to coprocessor #4846

Merged

Conversation

nicholascioli
Copy link
Contributor

This commit allows cancelled requests by a coprocessor to also cancel portions of a batch query correctly.

This logic should eventually be moved higher up the call stack of the router so that future plugins, custom plugins, and anything else that might cancel portions of a batch query not timeout without needing to copy this commit's logic.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

abernix and others added 22 commits March 15, 2024 13:31
…g. The max cost setting is planned to come from the limits block of the configuration, so it is not included in this change. The plugin is currently not registered so it does not appear in the config schema during development
Clippy is complaining

Co-authored-by: bryn <[email protected]>
Created scaffolding for new demand control plugin with a simple config.
The max cost setting is planned to come from the limits block of the
configuration, so it is not included in this change. The plugin is
currently not registered so it does not appear in the config schema
during development

Fixes
[ROUTER-181](https://apollographql.atlassian.net/browse/ROUTER-181)

<!-- start metadata -->
---

**Checklist**

Complete the checklist (and note appropriate exceptions) before the PR
is marked ready-for-review.

- [ ] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]: It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]: Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]: Tick whichever testing boxes are applicable. If you are adding
Manual Tests, please document the manual testing (extensively) in the
Exceptions.


[ROUTER-181]:
https://apollographql.atlassian.net/browse/ROUTER-181?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
in a Redis cluster, entities can be stored in different nodes, which
makes MGET fails. This splits the MGET query in multiple MGETs calls
grouped by key hash, to make sure each one will get to the corresponding
node, then merges responses in the correct order.
…#4802)

Add the ability to enable heartbeat for cases where the subgraph drops
idle connections.
For example, https://netflix.github.io/dgs/

Example of configuration:

```yaml
subscription:
  mode:
    passthrough:
      all:
        path: /graphql
        heartbeat_interval: enable #Optional
 ```

Fixes #4621

<!-- start metadata -->
---

**Checklist**

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

- [x] Changes are compatible[^1]
- [x] Documentation[^2] completed
- [x] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [x] Unit Tests
    - [ ] Integration Tests
    - [x] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.
[^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples.
[^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

---------

Co-authored-by: Jesse Rosenberger <[email protected]>
Each integration tests takes time to link, so this will help with build
times.
It also makes things a bit less chaotic as we can have modules for
integration tests.


<!-- start metadata -->
---

**Checklist**

Complete the checklist (and note appropriate exceptions) before the PR
is marked ready-for-review.

- [ ] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]: It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]: Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]: Tick whichever testing boxes are applicable. If you are adding
Manual Tests, please document the manual testing (extensively) in the
Exceptions.

Co-authored-by: bryn <[email protected]>
Follow-up to the v1.43.0 being officially released, bringing version
bumps and changelog updates into the `dev` branch.
To enable correlation between trace and logs trace_id and span_id should
be included on the log messages.

 Json:
 ```

{"timestamp":"2024-03-19T15:37:41.516453239Z","level":"INFO","trace_id":"54ac7e5f0e8ab90ae67b822e95ffcbb8","span_id":"9b3f88c602de0ceb","message":"Supergraph
GraphQL response".....
 ```

 Text:
 ```
2024-03-19T15:14:46.040435Z INFO trace_id:
bbafc3f048b6137375dd78c10df18f50 span_id: 40ede28c5df1b5cc router{
 ```

Configuration options have been added in logging for text and json
formats defaulting to true for json and false for text.

Json:
```
   exporters:
     logging:
       stdout:
         format:
           json:
             display_span_id: true
             display_trace_id: true
```

Text:
```
  exporters:
    logging:
      stdout:
        format:
          text:
            display_span_id: true
            display_trace_id: true
```


<!-- start metadata -->
---

**Checklist**

Complete the checklist (and note appropriate exceptions) before the PR
is marked ready-for-review.

- [ ] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]: It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]: Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]: Tick whichever testing boxes are applicable. If you are adding
Manual Tests, please document the manual testing (extensively) in the
Exceptions.

---------

Co-authored-by: bryn <[email protected]>
Co-authored-by: Edward Huang <[email protected]>
This commit allows cancelled requests by a coprocessor to also
cancel portions of a batch query correctly.

This logic should eventually be moved higher up the call stack of
the router so that future plugins, custom plugins, and anything
else that might cancel portions of a batch query not timeout
without needing to copy this commit's logic.
@nicholascioli nicholascioli requested a review from garypen March 25, 2024 21:44
@nicholascioli nicholascioli self-assigned this Mar 25, 2024
@router-perf
Copy link

router-perf bot commented Mar 25, 2024

CI performance tests

  • reload - Reload test over a long period of time at a constant rate of users
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • large-request - Stress test with a 1 MB request payload
  • const - Basic stress test that runs with a constant number of users
  • no-graphos - Basic stress test, no GraphOS.
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • xxlarge-request - Stress test with 100 MB request payload
  • xlarge-request - Stress test with 10 MB request payload
  • step - Basic stress test that steps up the number of users over time

goto-bus-stop and others added 2 commits March 26, 2024 09:13
…4844)

CI is failing on `choco install cmake` because it can't find the
`cmake.install` dependency. Weirdly it does find that package if you
install it directly. As I understand it, these packages follow an
established chocolatey pattern and so it's not an implementation detail,
so we should be able to rely on `cmake.install` directly.

<!-- start metadata -->
---

**Checklist**

Complete the checklist (and note appropriate exceptions) before the PR
is marked ready-for-review.

- [ ] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]: It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]: Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]: Tick whichever testing boxes are applicable. If you are adding
Manual Tests, please document the manual testing (extensively) in the
Exceptions.
…on (#4770)

Fix #4569
Fix #4576
Fix #4589
Fix #4590
Fix #4611

When the client closes the connection prematurely, it drops the entire
request handling task, which means that it won't go through the entire
response pipeline, where we record the operation and handle telemetry.
Some users also have additional steps with rhai or coprocessors where
they add metadata, and those steps should run even on canceled requests.
This moves the request handling to a separate task to make sure it runs,
but it also skips subgraph requests if we detected that the client
closed the connection, to prevent unneeded traffic.
Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I'll merge it now.

@nicholascioli nicholascioli requested a review from a team as a code owner March 26, 2024 11:27
@garypen garypen merged commit 6733f73 into garypen/2002-subgraph-batching Mar 26, 2024
2 of 7 checks passed
@garypen garypen deleted the nc/subgraph-batching/coprocessor branch March 26, 2024 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants