document the query plan cache warm up (#3815)

Fix #3704 Co-authored-by: Bryn Cooke <[email protected]> Co-authored-by: Edward Huang <[email protected]>
apollographql · Sep 22, 2023 · 765cf39 · 765cf39
1 parent 0996d4f
commit 765cf39
Show file tree

Hide file tree

Showing 2 changed files with 52 additions and 0 deletions.
diff --git a/.changesets/feat_geal_plan_cache_warmup_doc.md b/.changesets/feat_geal_plan_cache_warmup_doc.md
@@ -0,0 +1,13 @@
+### Query plan cache warm-up improvements ([Issue #3704](https://github.com/apollographql/router/issues/3704))
+
+The `warm_up_queries` option enables quicker schema updates by precomputing query plans for your most used cached queries and your persisted queries. When a new schema is loaded, a precomputed query plan for it may already be in in-memory cache.
+
+We made a series of improvements to this feature to make it more usable:
+* It is now active by default and warms up the cache with the 30% most used queries of the previous cache. The amount is still configurable, and it can be deactivated by setting it to 0.
+* We added new metrics to track the time spent loading a new schema and planning queries in the warm-up phase. You can also measure the query plan cache usage, to know how many entries are used, and the cache hit rate, both for the in memory cache and the distributed cache.
+* The warm-up will now plan queries in random order, to make sure that the work can be shared by multiple router instances using distributed caching
+* Persisted queries are part of the warmed up queries.
+
+You can get more information about operating the query plan cache and its warm-up phase in the [documentation](https://www.apollographql.com/docs/router/configuration/in-memory-caching#cache-warm-up)
+
+By [@Geal](https://github.com/Geal) in https://github.com/apollographql/router/pull/3815 https://github.com/apollographql/router/pull/3801 https://github.com/apollographql/router/pull/3767 https://github.com/apollographql/router/pull/3769 https://github.com/apollographql/router/pull/3770
diff --git a/docs/source/configuration/in-memory-caching.mdx b/docs/source/configuration/in-memory-caching.mdx
@@ -42,6 +42,45 @@ supergraph:
         limit: 512
 ```
 
+### Cache warm-up
+
+When loading a new schema, a query plan might change for some queries, so cached query plans cannot be reused. 
+
+To prevent increased latency upon query plan cache invalidation, the Router precomputes query plans for:
+* The most used queries from the cache.
+* The entire list of persisted queries.
+
+Precomputed plans will be cached before the Router switches traffic over to the new schema.
+
+By default, the Router warms up the cache with 30% of the queries already in cache, but it can be configured as follows:
+
+```yaml title="router.yaml"
+supergraph:
+  query_planning:
+    # Pre-plan the 100 most used operations when the supergraph changes
+    warmed_up_queries: 100
+```
+
+To get more information on the planning and warm-up process use the following metrics:
+
+(`<storage>` can be `redis` for distributed cache or `memory`)
+* counters:
+  * `apollo_router_cache_size{kind="query planner", storage="<storage>}`: current size of the cache (only for in-memory cache)
+  * `apollo_router_cache_hit_count{kind="query planner", storage="<storage>}`
+  * `apollo_router_cache_miss_count{kind="query planner", storage="<storage>}`
+
+* histograms:
+  * `apollo_router_query_planning_time`: time spent planning queries
+  * `apollo_router_schema_loading_time`: time spent loading a schema
+  * `apollo_router_cache_hit_time{kind="query planner", storage="<storage>}`: time to get a value from the cache
+  * `apollo_router_cache_miss_time{kind="query planner", storage="<storage>}`
+
+Typically, we would look at `apollo_router_cache_size` and the cache hit rate to define the right size of the in memory cache,
+then look at `apollo_router_schema_loading_time` and `apollo_router_query_planning_time` to decide how much time we want to spend warming up queries.
+
+#### Cache warm-up with distributed caching
+
+If the Router is using distributed caching for query plans, the warm-up phase will also store the new query plans in Redis. Since all Router instances might have the same distributions of queries in their in-memory cache, the list of queries is shuffled before warm-up, so each Router instance can plan queries in a different order and share their results through the cache.
 ## Caching automatic persisted queries (APQ)
 
 **Automatic Persisted Queries** (**APQ**) enable GraphQL clients to send a server the _hash_ of their query string, _instead of_ sending the query string itself. When query strings are very large, this can significantly reduce network usage.