Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after changing the domain from default consul to custom cannot reach any service or node #11526

Open
sanfx opened this issue Nov 8, 2021 · 6 comments
Labels
theme/dns Using Consul as a DNS provider, DNS related issues type/docs Documentation needs to be created/updated/clarified type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp

Comments

@sanfx
Copy link

sanfx commented Nov 8, 2021

Overview of the Issue

After changing the domain from default to lon in the configurations of all the consul servers and client agents, I was expecting to resolve the service redis using dig redis.service.lon

here is the configuration I am using

{
"node_name": "rockpix",
"bind_addr": "{{GetInterfaceIP \"enp1s0\"}}",
"ui": false,
"client_addr": "127.0.0.1 {{ GetPrivateIPs }}",
"data_dir": "/opt/consul",
"datacenter": "lon",
"domain": "lon",
"disable_remote_exec": false,
"bootstrap_expect": 3,
"log_level": "INFO",
"enable_syslog": true,
"enable_debug": true,
"server": true,
"leave_on_terminate": false,
"skip_leave_on_interrupt": true,
"rejoin_after_leave": true,
"advertise_addr": "{{GetInterfaceIP \"enp1s0\"}}",
"retry_join": [
  "192.168.1.11",
  "192.168.1.40"
  ]
}

for other nodes the value in retry_join changes. that's it. the only field I added is "domain": "lon"

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Stop all the consul cluster nodes
  2. Change the "domian": "lon" in all the consul server and client nodes.
  3. restart consul servers and client nodes.

Consul info for both Client and Server

Client info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease = 
	revision = c976ffd2
	version = 1.10.3
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = 192.168.1.40:8300
	server = true
raft:
	applied_index = 31140
	commit_index = 31140
	fsm_pending = 0
	last_contact = 4.879ms
	last_log_index = 31140
	last_log_term = 266
	last_snapshot_index = 30642
	last_snapshot_term = 266
	latest_configuration = [{Suffrage:Voter ID:79321fb9-b766-8b1b-3915-0991c9f8750d Address:192.168.1.11:8300} {Suffrage:Voter ID:bced0c51-052d-41f4-cf63-3d209fc2163d Address:192.168.1.40:8300} {Suffrage:Voter ID:2b8c1d1f-daa3-3719-de1a-1cff592a673c Address:192.168.1.46:8300}]
	latest_configuration_index = 0
	num_peers = 2
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 266
runtime:
	arch = arm
	cpu_count = 1
	goroutines = 105
	max_procs = 1
	os = linux
	version = go1.16.7
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 38
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 13389
	members = 4
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 5576
	members = 3
	query_queue = 0
	query_time = 1

Server info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 3
	services = 5
build:
	prerelease = 
	revision = c976ffd2
	version = 1.10.3
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = 192.168.1.40:8300
	server = true
raft:
	applied_index = 31135
	commit_index = 31135
	fsm_pending = 0
	last_contact = 75.699285ms
	last_log_index = 31135
	last_log_term = 266
	last_snapshot_index = 16387
	last_snapshot_term = 261
	latest_configuration = [{Suffrage:Voter ID:79321fb9-b766-8b1b-3915-0991c9f8750d Address:192.168.1.11:8300} {Suffrage:Voter ID:bced0c51-052d-41f4-cf63-3d209fc2163d Address:192.168.1.40:8300} {Suffrage:Voter ID:2b8c1d1f-daa3-3719-de1a-1cff592a673c Address:192.168.1.46:8300}]
	latest_configuration_index = 0
	num_peers = 2
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 266
runtime:
	arch = amd64
	cpu_count = 4
	goroutines = 116
	max_procs = 4
	os = linux
	version = go1.16.7
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 38
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 13389
	members = 4
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 5576
	members = 3
	query_queue = 0
	query_time = 1

Operating system and Environment details

OS, Architecture, and any other information you can provide about the environment.
Client : Linux raspberrypi 5.10.63+ #1459 Wed Oct 6 16:40:27 BST 2021 armv6l GNU/Linux
Server1 : Linux rockpi 4.4.154-95-rockchip-gd2ab1f26e1b3 #1 SMP Mon Oct 21 06:42:35 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux

Server2: Linux rockpix 5.11.0-38-generic #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Server3: Linux rockpro64 5.10.63-rockchip64 #21.08.2 SMP PREEMPT Wed Sep 8 10:57:23 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

Log Fragments

Nov 08 19:52:19 raspberrypi consul[13563]: 2021-11-08T19:52:19.464Z [INFO]  agent: Joining cluster...: cluster=LAN
Nov 08 19:52:19 raspberrypi consul[13563]: 2021-11-08T19:52:19.464Z [INFO]  agent: (LAN) joining: lan_addresses=[192.168.1.11, 192.168.1.40]
Nov 08 19:52:19 raspberrypi consul[13563]: 2021-11-08T19:52:19.430Z [INFO]  agent.server: Adding LAN server: server="rockpix (Addr: tcp/192.168.1.11:8300) (DC: lon)"
Nov 08 19:52:19 raspberrypi consul[13563]: 2021-11-08T19:52:19.483Z [INFO]  agent.server.serf.wan: serf: Re-joined to previously known node: rockpro64.lon: 192.168.1.40:8302
Nov 08 19:52:19 raspberrypi consul[13563]: 2021-11-08T19:52:19.430Z [INFO]  agent.server: New leader elected: payload=rockpro64
Nov 08 19:52:19 raspberrypi consul[13563]: 2021-11-08T19:52:19.363Z [INFO]  agent.server: Handled event for server in area: event=member-join server=rockpro64.lon area=wan
Nov 08 19:52:19 raspberrypi consul[13563]: 2021-11-08T19:52:19.443Z [INFO]  agent.server: Handled event for server in area: event=member-join server=rockpix.lon area=wan
Nov 08 19:52:19 raspberrypi consul[13563]: 2021-11-08T19:52:19.387Z [WARN]  agent: grpc: addrConn.createTransport failed to connect to {lon-192.168.1.46:8300 0 pizero.lon <nil>}. Err :connection error: des
Nov 08 19:52:19 raspberrypi consul[13563]: 2021-11-08T19:52:19.387Z [INFO]  agent: Starting server: address=[::]:8500 network=tcp protocol=http

@jkirschner-hashicorp
Copy link
Contributor

jkirschner-hashicorp commented Nov 8, 2021

@sanfx : I see this is your first issue here - welcome to the Consul community!

Can you share with me the exact command you are using with dig? (How does the dig command know which port and IP address at which to access Consul DNS?)

And did the query resolve correctly for redis.service.consul before you changed the domain configuration to lon?

What guides or documentation pages were you following to try to get this working (if any)?

@jkirschner-hashicorp jkirschner-hashicorp added theme/dns Using Consul as a DNS provider, DNS related issues type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp labels Nov 8, 2021
@sanfx
Copy link
Author

sanfx commented Nov 9, 2021

yes, I have a server node running dnsmasq
so I followed these steps: https://learn.hashicorp.com/tutorials/consul/dns-forwarding#dnsmasq-setup
and added below to the node running dnsmasq config

$ cat  /etc/dnsmasq.d/10-consul 
# Enable forward lookup of the 'consul' domain:
server=/consul/127.0.0.1#8600

now this node works as dns service (primary) for entire network.
using dig like this:```

$ dig rockpix.node.consul

; <<>> DiG 9.16.1-Ubuntu <<>> rockpix.node.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61639
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;rockpix.node.consul.		IN	A

;; ANSWER SECTION:
rockpix.node.consul.	0	IN	A	192.168.1.11

;; Query time: 19 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Nov 09 13:36:29 GMT 2021
;; MSG SIZE  rcvd: 64

works the same way for service as well.

$ dig redis.service.consul

; <<>> DiG 9.16.1-Ubuntu <<>> redis.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62916
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;redis.service.consul.		IN	A

;; ANSWER SECTION:
redis.service.consul.	0	IN	A	172.17.0.3
redis.service.consul.	0	IN	A	172.17.0.4

;; Query time: 251 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Nov 09 13:37:12 GMT 2021
;; MSG SIZE  rcvd: 81

for the time being we have reverted the settings of domain but would want to use lon instead of consul

@sanfx sanfx changed the title after changing the domain from default to consul cannot reach any service or node after changing the domain from default consul to custom cannot reach any service or node Nov 9, 2021
@jkirschner-hashicorp
Copy link
Contributor

@sanfx : when you changed the Consul domain to lon, did you also update your dnsmasq configuration accordingly?

$ cat  /etc/dnsmasq.d/10-consul 
# Enable forward lookup of the 'lon' domain:
server=/lon/127.0.0.1#8600

@jkirschner-hashicorp jkirschner-hashicorp added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Nov 9, 2021
@github-actions github-actions bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Nov 9, 2021
@sanfx
Copy link
Author

sanfx commented Nov 9, 2021

Thanks @jkirschner-hashicorp changing in the file /etc/dnsmasq.d/10-consul solved the issue. I think this need to go in the documention where you guys explain how to configure using dnsmasq w.r.t custom domain .

@sanfx sanfx closed this as completed Nov 9, 2021
@jkirschner-hashicorp
Copy link
Contributor

Hi @sanfx,

Glad to hear you got this working!

Thanks for the docs improvement suggestion. I'll reopen this task for now and apply the type/docs label so we can keep track of that.

When we submit a PR, I might ping you to ask if you think the change would've helped you, or whether you'd make a different change to the docs.

@jkirschner-hashicorp jkirschner-hashicorp added the type/docs Documentation needs to be created/updated/clarified label Nov 9, 2021
@jkirschner-hashicorp
Copy link
Contributor

@sanfx: when this wasn't working, what were you using or looking at to troubleshoot? I'm wondering if there's something Consul could've done differently in this case to help (beyond updating the docs page).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/dns Using Consul as a DNS provider, DNS related issues type/docs Documentation needs to be created/updated/clarified type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp
Projects
None yet
Development

No branches or pull requests

2 participants