Skip to content

Commit

Permalink
add a warning if we think istio-proxy injection is causing problems (#…
Browse files Browse the repository at this point in the history
…3545)

We have encountered situations where the injection of istio-proxy in a
router pod (executing in kubernetes) causes strange networking errors
during uplink retrieval.

The root cause of the issue is that the router is executing and
attempting retrieve uplink schemas whilst the istio-proxy is modifying
network configuration at the same time.

This warning message will direct users to information which should help
them to configure their cluster or pod to avoid this problem.

fixes: #3533

<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [x] Changes are compatible[^1]
- [x] Documentation[^2] completed
- [x] Performance impact assessed and acceptable
- Tests added and passing[^3]
    ~~- [ ] Unit Tests~~
    ~~- [ ] Integration Tests~~
    - [x] Manual Tests

**Exceptions**

This is difficult to test, since the root cause (istio networking
re-configuration as the pod executes) is very difficult to reproduce in
a test environment.

Manual testing was performed by:
 - triggering a nightly build
 - deploying the resulting image to a test cluster
- using the istio pod annotation: `proxy.istio.io/config: '{
"holdApplicationUntilProxyStarts": false }'` to ensure the error
occurred
 - observing the desired warning message is produced

Here's some sample output from the test:
```
{"timestamp":"2023-08-07T13:04:15.762898Z","level":"WARN","message":"If your router is executing within a kubernetes pod, this failure may be caused by istio-proxy injection. See #3533 for more details about how to solve this","target":"apollo_router::uplink"}
{"timestamp":"2023-08-07T13:04:15.782715Z","level":"ERROR","message":"fetch failed from all endpoints","target":"apollo_router::router::event::schema"}
```

 **Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`

---------

Co-authored-by: Maria Elisabeth Schreiber <[email protected]>
  • Loading branch information
garypen and Meschreiber authored Aug 9, 2023
1 parent 8511a9b commit 5852267
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 1 deletion.
9 changes: 9 additions & 0 deletions .changesets/maint_garypen_3533_istio_warn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
### Add a warning if we think istio-proxy injection is causing problems ([Issue #3533](https://github.com/apollographql/router/issues/3533))

We have encountered situations where the injection of istio-proxy in a router pod (executing in Kubernetes) causes networking errors during uplink retrieval.

The root cause is that the router is executing and attempting to retrieve uplink schemas while the istio-proxy is simultaneously modifying network configuration.

This new warning message directs users to information which should help them to configure their Kubernetes cluster or pod to avoid this problem.

By [@garypen](https://github.com/garypen) in https://github.com/apollographql/router/pull/3545
23 changes: 22 additions & 1 deletion apollo-router/src/uplink/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use std::error::Error as stdError;
use std::fmt::Debug;
use std::time::Duration;
use std::time::Instant;
Expand Down Expand Up @@ -359,7 +360,27 @@ where
Query: graphql_client::GraphQLQuery,
{
let client = reqwest::Client::builder().timeout(timeout).build()?;
let res = client.post(url).json(request_body).send().await?;
// It is possible that istio-proxy is re-configuring networking beneath us. If it is, we'll see an error something like this:
// level: "ERROR"
// message: "fetch failed from all endpoints"
// target: "apollo_router::router::event::schema"
// timestamp: "2023-08-01T10:40:28.831196Z"
// That's deeply confusing and very hard to debug. Let's try to help by printing out a helpful error message here
let res = client
.post(url)
.json(request_body)
.send()
.await
.map_err(|e| {
if let Some(hyper_err) = e.source() {
if let Some(os_err) = hyper_err.source() {
if os_err.to_string().contains("tcp connect error: Cannot assign requested address (os error 99)") {
tracing::warn!("If your router is executing within a kubernetes pod, this failure may be caused by istio-proxy injection. See https://github.com/apollographql/router/issues/3533 for more details about how to solve this");
}
}
}
e
})?;
tracing::debug!("uplink response {:?}", res);
let response_body: graphql_client::Response<Query::ResponseData> = res.json().await?;
Ok(response_body)
Expand Down
5 changes: 5 additions & 0 deletions docs/source/containerization/kubernetes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -254,3 +254,8 @@ If you had a router running on your localhost, with default health-check configu

curl "http://localhost:8088/health"

## Using `istio` with the router

The [istio service mesh](https://istio.io/) is a very popular choice for enhanced traffic routing within Kubernetes.

`istio-proxy` pod injection can cause an [issue](https://github.com/apollographql/router/issues/3533) in the router. The router may start executing at the same time that istio is reconfiguring networking for the router pod. This is an issue with `istio`, not the router, and you can resolve it by following the advice in [istio's injection documentation](https://istio.io/latest/docs/ops/common-problems/injection/#pod-or-containers-start-with-network-issues-if-istio-proxy-is-not-ready).

0 comments on commit 5852267

Please sign in to comment.