Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul services are incorrectly merged when they share the same name [0.9.3] #5852

Closed
Dirrk opened this issue Jun 19, 2019 · 5 comments
Closed

Comments

@Dirrk
Copy link

Dirrk commented Jun 19, 2019

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Output from nomad version
Nomad v0.9.3 (c5e8b66)
consul version
Consul v1.4.4
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

Operating system and Environment details

Linux ip-10-***** 4.4.0-1074-aws #84-Ubuntu SMP Thu Dec 6 08:57:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Issue

Background:
I am in the middle of testing 0.9.3 in our development environment, I had previously upgraded from 0.8.7.

The following job file excludes the config, but it's essentially running fluentd as a log aggregator. This aggregator uses the prometheus plugin which runs on each of the 4 distinct ports. The ports are defined in nomad as metrics,metrics1,metrics2,metrics3.

Expectation:
In <=0.8.7, I would expect 4 services registered, 1 for each port and all having 1 check each. In 0.9.3, the result is the last service stanza is the only one that gets defined as a service, but it does has 4 health checks.

Reproduction steps

I created this consul template to show the differences:

# dev
{{ range service "fluentd-aggr-metrics@dc1-dev" }}{{ .ID }}:{{ .Port }}
{{end}}

# prod
{{ range service "fluentd-aggr-metrics@dc1-prod" }}{{ .ID }}:{{ .Port }}
{{end}}

running the consul-template resulted in this:

# dev
_nomad-task-1273de18-e269-331d-8319-bdd95a90859d-fluentd-fluentd-aggr-metrics:24228
_nomad-task-7332b856-0703-7110-7cd4-4828bf7e085d-fluentd-fluentd-aggr-metrics:24228
_nomad-task-63bcd742-eeaa-816d-6f4b-8ab43c8771c9-fluentd-fluentd-aggr-metrics:24228
_nomad-task-04fbcacb-67e8-09e6-1fec-a23d07acca9d-fluentd-fluentd-aggr-metrics:24228
_nomad-task-721114b7-5e2f-57ac-7fb4-1d5e8505e8e6-fluentd-fluentd-aggr-metrics:24228
_nomad-task-81dbf46c-4236-35f2-3a83-d0569a324c61-fluentd-fluentd-aggr-metrics:24228
_nomad-task-8aab451d-55d7-07f8-926a-ec637bfe7a2c-fluentd-fluentd-aggr-metrics:24228
_nomad-task-579a13f6-aed8-4ff8-4fc7-05a6a3cc48dd-fluentd-fluentd-aggr-metrics:24228
_nomad-task-7f1a04c3-128e-88c1-bb4d-167204a34718-fluentd-fluentd-aggr-metrics:24228
_nomad-task-1afc027c-fa13-596c-c6b4-b7ecb68f3aae-fluentd-fluentd-aggr-metrics:24228
_nomad-task-c5182416-fe81-e825-75c4-d96021556c9c-fluentd-fluentd-aggr-metrics:24228


# prod
_nomad-task-j3iexj34wk5uc76srafbmffygid3zvou:24228
_nomad-task-m2ako2xx5b7el4gwwizbywkpnuperzff:24227
_nomad-task-mijpwixyu36z2dw4pdnmldwbikgm5txy:24225
_nomad-task-pez55j4ucreqm3clnpthm7vkkamfpxng:24226
_nomad-task-2ojzgpapznwz7keh5xlgzufzzygy3gnu:24226
_nomad-task-aryyzg3yqmix5yzlcbvvj534y3j5bbo4:24227
_nomad-task-yw23rdhjggagmjkaaasgd6mair5s4dls:24225
_nomad-task-yw4quns3hdqm65xlcudj3fhuteulzgia:24228
_nomad-task-6glver2dj5iw5ouiaheuusnptm3gh4ze:24226
_nomad-task-6iefosznbmu3v53nakj4kbgwnk5wpgq2:24228
_nomad-task-ob2svkjpfatzle3x2w5wuj6pedby3nze:24227
_nomad-task-xwqt6ojcg2ntavtifayuioryrjoubmzs:24225
_nomad-task-cjqfsjbqd7o5ymmhjt7qzsu2wio6tttt:24226
_nomad-task-h352gwmkalg42fvnl6fbawv2rc5ikprm:24227
_nomad-task-mw6i23kkvzoog6yjmfkpkce5ngaphc2i:24225
_nomad-task-uhljxg4yvxhe3cez3n4uhrxedz7ue5ql:24228
_nomad-task-djp7sz3x4bfskn2sfs2jxt4pcre2iug4:24225
_nomad-task-voigpjyn4wnuzz7zuis2xkyouew32b7d:24228
_nomad-task-xxwji5wzxbnbfd65vw2bldqlxy2npati:24227
_nomad-task-znshn66cpb5mr74vfyiobqisjgsongeg:24226
_nomad-task-d5ycznftqlrmmakpzyn63uvsm5wyzgpe:24227
_nomad-task-femcxfrxz7t36xxl3cti3vjirfhdhxlr:24225
_nomad-task-qkqnlom2apiv7d2j677d5szijhguzq7v:24226
_nomad-task-ungpjg3hpi7j6w5iwdhgpkg3cac2efk4:24228
_nomad-task-6kdf2pfg2hmdxhysvwsz262sqdlxpnl5:24225
_nomad-task-7xbnxskckubymsgrl2ybmif77mwikgvt:24227
_nomad-task-nitceg3i73rx2njpzyisaftbncu7g74l:24228
_nomad-task-sc2wv4gntcgeqgrrxrraeh7qskkgzp7k:24226
_nomad-task-3bgtrrjbwf4dxv5oz2evnmo2mb76nui4:24228
_nomad-task-khb2h55bbt3s34vqif55irai63jyov62:24227
_nomad-task-orjua6zf3423ot7tedxkw6jup7l7xt75:24225
_nomad-task-y3a5b2py4epue5vmqb7fpew2ljavjkya:24226
_nomad-task-63mwudx3rsviodeyaydei4xzjh64bq3f:24228
_nomad-task-kbcl3lrasaoylogxoq5u563xd4nh3tic:24225
_nomad-task-kpihay4juy6seuo6gsnhfugh2dnozfft:24227
_nomad-task-vprs6alimy7fonhe7nbetholkm2oxkrh:24226
_nomad-task-227fcbgp4gi2iin7scmtqmwuq5vpedny:24227
_nomad-task-bbyssep4nwlyicaexedqozfu7npgv3c2:24228
_nomad-task-tcnbory3ulszs44ahihu7nqfolr5rqpe:24225
_nomad-task-vrcfs3scygxvwquvczfm3qkfb52vvu2m:24226
_nomad-task-5qauj2ulo42earedjckzr6c3df5zqj6l:24228
_nomad-task-65gbivgsswncobdps63ukeernnk5nxen:24227
_nomad-task-awytbkclkfjkmsjpgawn62exl2l5yri4:24225
_nomad-task-dluee7taa75mskkugn4m4hsskgb3j74p:24226

I verified that the 4 checks were actually on the host by curling the host localhost:8500/v1/agent/checks | jq '.' | grep 'fluentd-aggr-metrics'

curl "http://localhost:8500/v1/agent/checks" | jq '.' | grep 'fluentd-aggr-metrics'
    "Name": "service: \"fluentd-aggr-metrics\" check",
    "ServiceID": "_nomad-task-1273de18-e269-331d-8319-bdd95a90859d-fluentd-fluentd-aggr-metrics",
    "ServiceName": "fluentd-aggr-metrics",
    "Name": "service: \"fluentd-aggr-metrics\" check",
    "ServiceID": "_nomad-task-1273de18-e269-331d-8319-bdd95a90859d-fluentd-fluentd-aggr-metrics",
    "ServiceName": "fluentd-aggr-metrics",
    "Name": "service: \"fluentd-aggr-metrics\" check",
    "ServiceID": "_nomad-task-1273de18-e269-331d-8319-bdd95a90859d-fluentd-fluentd-aggr-metrics",
    "ServiceName": "fluentd-aggr-metrics",
    "Name": "service: \"fluentd-aggr-metrics\" check",
    "ServiceID": "_nomad-task-1273de18-e269-331d-8319-bdd95a90859d-fluentd-fluentd-aggr-metrics",
    "ServiceName": "fluentd-aggr-metrics",

Job file (if appropriate)

job "fluentd" {

 group "aggregator" {
  count = 1

  task "fluentd" {
    driver = "docker"
    # removed for privacy

    service {
      name = "fluentd-aggr"
      port = "forward"

      check_restart {
        limit = 3
        grace = "5m"
        ignore_warnings = false
      }

      check {
        type = "tcp"
        port = "forward"
        timeout = "5s"
        interval = "30s"
      }
    }

    service {
      name = "fluentd-aggr-metrics"
      port = "metrics"

      check_restart {
        limit = 3
        grace = "5m"
        ignore_warnings = false
      }

      check {
        type = "tcp"
        port = "metrics"
        interval = "60s"
        timeout = "15s"
      }
    }

    service {
      name = "fluentd-aggr-metrics"
      port = "metrics1"

      check_restart {
        limit = 3
        grace = "5m"
        ignore_warnings = false
      }

      check {
        type = "tcp"
        port = "metrics1"
        interval = "60s"
        timeout = "15s"
      }
    }

    service {
      name = "fluentd-aggr-metrics"
      port = "metrics2"

      check_restart {
        limit = 3
        grace = "5m"
        ignore_warnings = false
      }

      check {
        type = "tcp"
        port = "metrics2"
        interval = "60s"
        timeout = "15s"
      }
    }

    service {
      name = "fluentd-aggr-metrics"
      port = "metrics3"

      check_restart {
        limit = 3
        grace = "5m"
        ignore_warnings = false
      }

      check {
        type = "tcp"
        port = "metrics3"
        interval = "60s"
        timeout = "15s"
      }
    }

    resources {
      cpu = 200
      memory = 1000

      network {
        mbits = 10

        port "forward" {
          static = 24224
        }

        port "metrics" {
          static = 24225
        }
        port "metrics1" {
          static = 24226
        }
        port "metrics2" {
          static = 24227
        }
        port "metrics3" {
          static = 24228
        }
      }
    }
  }
}

Work arounds

Steps I tried to create a work around:

  1. I added tags[] to the server with different values (worker0,worker1 etc..). This resulted in adding the worker3 as the only tag to the single service.
nomad plan dev/monitoring-system-fluentd.nomad
+/- Job: "monitoring-system-fluentd"
+/- Task Group: "aggregator" (11 in-place update)
  +/- Task: "fluentd" (forces in-place update)
    +/- Service {
        AddressMode: "auto"
        Name:        "fluentd-aggr-metrics"
        PortLabel:   "metrics3"
      + Tags {
        + Tags: "worker3"
        }
        }
  1. I changed each service name to a distinct name. This worked and even planned correctly, and while it will work as a temporary workaround for me, I can imagine other people might not be able to work around it so easily.
nomad plan dev/monitoring-system-fluentd.nomad
+/- Job: "monitoring-system-fluentd"
+/- Task Group: "aggregator" (11 in-place update)
  +/- Task: "fluentd" (forces in-place update)
    +/- Service {
          AddressMode: "auto"
          Name:        "fluentd-aggr-metrics"
      +/- PortLabel:   "metrics3" => "metrics"
      +/- Check {
            AddressMode:   ""
            Command:       ""
            GRPCService:   ""
            GRPCUseTLS:    "false"
            InitialStatus: ""
            Interval:      "60000000000"
            Method:        ""
            Name:          "service: \"fluentd-aggr-metrics\" check"
            Path:          ""
        +/- PortLabel:     "metrics3" => "metrics"
            Protocol:      ""
            TLSSkipVerify: "false"
            Timeout:       "15000000000"
            Type:          "tcp"
          }
        }
    +   Service {
        + AddressMode: "auto"
        + Name:        "fluentd-aggr-metrics1"
        + PortLabel:   "metrics1"
        + Check {
            AddressMode:   ""
            Command:       ""
            GRPCService:   ""
          + GRPCUseTLS:    "false"
            InitialStatus: ""
          + Interval:      "60000000000"
            Method:        ""
          + Name:          "service: \"fluentd-aggr-metrics1\" check"
            Path:          ""
          + PortLabel:     "metrics1"
            Protocol:      ""
          + TLSSkipVerify: "false"
          + Timeout:       "15000000000"
          + Type:          "tcp"
          + CheckRestart {
            + Grace:          "300000000000"
            + IgnoreWarnings: "false"
            + Limit:          "3"
            }
          }
        }
    +   Service {
        + AddressMode: "auto"
        + Name:        "fluentd-aggr-metrics2"
        + PortLabel:   "metrics2"
        + Check {
            AddressMode:   ""
            Command:       ""
            GRPCService:   ""
          + GRPCUseTLS:    "false"
            InitialStatus: ""
          + Interval:      "60000000000"
            Method:        ""
          + Name:          "service: \"fluentd-aggr-metrics2\" check"
            Path:          ""
          + PortLabel:     "metrics2"
            Protocol:      ""
          + TLSSkipVerify: "false"
          + Timeout:       "15000000000"
          + Type:          "tcp"
          + CheckRestart {
            + Grace:          "300000000000"
            + IgnoreWarnings: "false"
            + Limit:          "3"
            }
          }
        }
    +   Service {
        + AddressMode: "auto"
        + Name:        "fluentd-aggr-metrics3"
        + PortLabel:   "metrics3"
        + Check {
            AddressMode:   ""
            Command:       ""
            GRPCService:   ""
          + GRPCUseTLS:    "false"
            InitialStatus: ""
          + Interval:      "60000000000"
            Method:        ""
          + Name:          "service: \"fluentd-aggr-metrics3\" check"
            Path:          ""
          + PortLabel:     "metrics3"
            Protocol:      ""
          + TLSSkipVerify: "false"
          + Timeout:       "15000000000"
          + Type:          "tcp"
          + CheckRestart {
            + Grace:          "300000000000"
            + IgnoreWarnings: "false"
            + Limit:          "3"
            }
          }
        }
@Dirrk
Copy link
Author

Dirrk commented Jun 19, 2019

I started digging into the code and found it was already fixed (I think). I did not see it in any change logs which surprised me since it is a backwards incompatible fix.

efdfef8#diff-8612d5e61f84976db572b3f38eb380c0

@stevenscg
Copy link

I ran into this very issue just yesterday while working with my test cluster on 0.9.3 upgraded from 0.8.7. The 2 services my job registered looked like your workaround #1 and have worked for years. I was able to change my service names as a workaround.

@preetapan
Copy link
Contributor

@Dirrk You are right that this was fixed recently in master, previously reported as #5819, and will be available in Nomad 0.9.4. Thanks for bringing the missing changelog entry to our attention, I have updated our changelog now.

@Dirrk
Copy link
Author

Dirrk commented Jun 19, 2019

@preetapan thanks!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants