Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul: Include port-label in service registration #5829

Merged
merged 1 commit into from
Jun 13, 2019
Merged

Conversation

endocrimes
Copy link
Contributor

@endocrimes endocrimes commented Jun 13, 2019

fixes #5819

It is possible to provide multiple identically named services with
different port assignments in a Nomad configuration.

We introduced a regression when migrating to stable service identifiers where
multiple services with the same name would conflict, and the last definition
would take precedence.

This commit includes the port label in the stable service identifier to
allow the previous behaviour where this was supported, for example
providing:

service {
  name = "redis-cache"
  tags = ["global", "cache"]
  port = "db"
  check {
    name     = "alive"
    type     = "tcp"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "foo"]
  port = "foo"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "bar"]
  port = "bar"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}

in a nomad task definition is now completely valid. Each service
definition with the same name must still have a unique port label. This is
enforced by nomad and would return the following for duplicate service-name/port-name definitions:

Error submitting job: Unexpected response code: 500 (1 error(s) occurred:

* Task group cache validation failed: 1 error(s) occurred:

* Task redis validation failed: 1 error(s) occurred:

* 1 error(s) occurred:

* service "redis-cache" is duplicate)

@endocrimes endocrimes requested a review from schmichael June 13, 2019 13:08
Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, but I have some questions:

  • How do we handle resolution of port labels? Do we detect that it's a duplicate service declaration if one specifies port="http" and port="80" (or whatever port, http maps)? I presume not, specially for dynamic port allocation?
  • How does this handle upgrades? Do it have the same limitation as Consul Catalog Integration Fixes #5536 where we flap health check after upgrade client?

It is possible to provide multiple identically named services with
different port assignments in a Nomad configuration.

We introduced a regression when migrating to stable service identifiers where
multiple services with the same name would conflict, and the last definition
would take precedence.

This commit includes the port label in the stable service identifier to
allow the previous behaviour where this was supported, for example
providing:

```hcl
service {
  name = "redis-cache"
  tags = ["global", "cache"]
  port = "db"
  check {
    name     = "alive"
    type     = "tcp"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "foo"]
  port = "foo"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "bar"]
  port = "bar"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}
```

in a nomad task definition is now completely valid. Each service
definition with the same name must still have a unique port label however.
@endocrimes
Copy link
Contributor Author

@notnoop WRT upgrades, we flap service health every time you rotate a Nomad Client because of a race condition between task restoration and the Consul client, so we do flap service health, but not because of this change.

The port conflicts question is a little bit tricky, but Tl;DR, it works.

Forcing one to happen is actually fairly difficult.

To specify an explicit port as part of a service definition, you must force it to use driver mode networking (https://www.nomadproject.io/docs/job-specification/service.html#port).

If you were to specify a driver mode service with port 80, and also have a static port assignment in a network stanza foo that also requested port 80, then this would pass our duplicate port validation (

nomad/nomad/structs/structs.go

Lines 5567 to 5588 in efdfef8

func validateServices(t *Task) error {
var mErr multierror.Error
// Ensure that services don't ask for nonexistent ports and their names are
// unique.
servicePorts := make(map[string]map[string]struct{})
addServicePort := func(label, service string) {
if _, ok := servicePorts[label]; !ok {
servicePorts[label] = map[string]struct{}{}
}
servicePorts[label][service] = struct{}{}
}
knownServices := make(map[string]struct{})
for i, service := range t.Services {
if err := service.Validate(); err != nil {
outer := fmt.Errorf("service[%d] %+q validation failed: %s", i, service.Name, err)
mErr.Errors = append(mErr.Errors, outer)
}
// Ensure that services with the same name are not being registered for
// the same port
if _, ok := knownServices[service.Name+service.PortLabel]; ok {
).

In consul/client.go, these would be separate services, because the driver service would have a portlabel 80, and the other service would have portlabel foo.

These both then get registered in consul as two different service ids, that happen to be the same service name. In the case of mixed network modes, this makes sense as one would be via an overlay network and one would be direct-to-host, which is potentially valuable.

In the case of same network-mode, I'd class this as a validation bug, but it's relatively harmless as both are registered correctly.

Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow up and the detailed explanation!

@endocrimes endocrimes merged commit 17f9616 into master Jun 13, 2019
@endocrimes endocrimes deleted the dani/b-5819 branch June 13, 2019 14:28
@github-actions
Copy link

github-actions bot commented Feb 8, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Multiple services with common tag not being registered with Consul
2 participants