Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[beta] Transparent proxy jobs can be scheduled on nodes without transparent proxy (ie. older versions) #20614

Closed
awanaut opened this issue May 16, 2024 · 2 comments · Fixed by #20623
Assignees
Milestone

Comments

@awanaut
Copy link
Contributor

awanaut commented May 16, 2024

Nomad version

Output from nomad version
mix of 1.8-beta+ent and 1.7.4+ent

Operating system and Environment details

Debian 12 client nodes with Nomad 1.8-beta+ent and Consul 1.17.2
Debian 12 client nodes with Nomad 1.7.4+ent and Consul 1.17.2

Issue

When scheduling a job that contains the new transparent_proxy{} block, it seems Nomad will ignore that attribute and possibly schedule it on non-transparent proxy nodes. In my case it would be client nodes that are version 1.7.4. To workaround, I created a constraint for nomad.version to ensure it's schedule on the correct node.

Reproduction steps

  1. Add the following to a group:
connect {
        sidecar_service {
          proxy {
            transparent_proxy {}
          }
        }  
      }
  1. I either use a constraint to force it upon a host that is older than 1.8 or I set the 1.8 node to ineligible.
  2. Submit job. Nomad schedules it and tells me the allocation is healthy.

Expected Result

I'm sure I could add in a health check, but I would expect Nomad to read the transparent_proxy{} block and read the attributes of the nodes before making the scheduling decision just like the other attributes.

Actual Result

Nomad will schedule transparent_proxy jobs on nodes without transparent proxy

Job file (if appropriate)

job "downstream" {
  datacenters = ["lab"]

  group "downstream" {
    count = 1

    network {
      port "expose" {}
    }        

    service {
      name = "downstream"
      port = "9090"

      check {
        expose   = true
        type     = "http"
        path     = "/health"
        interval = "30s"
        timeout  = "5s"
       
      }

      connect {
        sidecar_service {
          proxy {
            transparent_proxy {}                                 
          }
        }  
      }
    }          

    task "downstream" {
      driver = "docker"

      config {
        image = "nicholasjackson/fake-service:v0.26.2"  
      }

      env {
        NAME = "downstream"
        UPSTREAM_URIS = "http://upstream.virtual.consul"
      }                   
    }
  }
}
@tgross
Copy link
Member

tgross commented May 16, 2024

Hi @awanaut! Thanks for this report! I added constraints for the CNI plugin in #20244 but you're right that doesn't restrict the client version appropriately. I'll get this fixed for the final release.

@tgross tgross self-assigned this May 16, 2024
@tgross tgross added theme/consul/connect Consul Connect integration theme/scheduling labels May 16, 2024
@tgross tgross added this to the 1.8.0 milestone May 16, 2024
@tgross tgross changed the title Transparent proxy jobs can be scheduled on nodes without transparent proxy (ie. older versions) [beta] Transparent proxy jobs can be scheduled on nodes without transparent proxy (ie. older versions) May 16, 2024
tgross added a commit that referenced this issue May 17, 2024
The new transparent proxy feature already has an implicity constraint on the
presence of the CNI plugin. But if the CNI plugin is installed on an older
version of Nomad, this isn't sufficient to protect against placing tproxy
workloads on clients that can't support it. Add a Nomad version constraint as
well.

Fixes: #20614
@tgross
Copy link
Member

tgross commented May 17, 2024

Fix is up in #20623

tgross added a commit that referenced this issue May 17, 2024
The new transparent proxy feature already has an implicity constraint on the
presence of the CNI plugin. But if the CNI plugin is installed on an older
version of Nomad, this isn't sufficient to protect against placing tproxy
workloads on clients that can't support it. Add a Nomad version constraint as
well.

Fixes: #20614
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants