Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul Server logs flooded with WARN message #8663

Closed
danlsgiga opened this issue Sep 11, 2020 · 8 comments · Fixed by #8685
Closed

Consul Server logs flooded with WARN message #8663

danlsgiga opened this issue Sep 11, 2020 · 8 comments · Fixed by #8685
Labels
type/bug Feature does not function as expected

Comments

@danlsgiga
Copy link

danlsgiga commented Sep 11, 2020

Overview of the Issue

After upgrading to Consul 1.8.4 from Consul 1.8.0 our Consul Server logs are being filled with the following WARN message excessively

agent.router: Non-server in server-only area: non_server=<MULTIPLE_AGENTS_HOSTNAMES> area=lan

Consul info for both Client and Server

Client info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 2
	services = 1
build:
	prerelease =
	revision = 12b16df3
	version = 1.8.4
consul:
	acl = enabled
	known_servers = 3
	server = false
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 57
	max_procs = 8
	os = linux
	version = go1.14.6
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 509
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 6688
	members = 63
	query_queue = 0
	query_time = 1
Server info
agent:
	check_monitors = 0
	check_ttls = 1
	checks = 4
	services = 4
build:
	prerelease =
	revision = 12b16df3
	version = 1.8.4
consul:
	acl = enabled
	bootstrap = false
	known_datacenters = 1
	leader = true
	leader_addr = 172.16.1.38:8300
	server = true
raft:
	applied_index = 27447490
	commit_index = 27447490
	fsm_pending = 0
	last_contact = 0
	last_log_index = 27447490
	last_log_term = 2025
	last_snapshot_index = 27445975
	last_snapshot_term = 1990
	latest_configuration = [{Suffrage:Voter ID:0888a1aa-1434-16b0-83d9-5cf5a01d88b9 Address:172.16.2.207:8300} {Suffrage:Voter ID:42421371-1732-6bb7-12ae-102cc6b31710 Address:172.16.1.38:8300} {Suffrage:Voter ID:acbb0f8c-ae13-d973-f548-3189d65f5cc6 Address:172.16.0.218:8300}]
	latest_configuration_index = 0
	num_peers = 2
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 2025
runtime:
	arch = amd64
	cpu_count = 2
	goroutines = 660
	max_procs = 2
	os = linux
	version = go1.14.6
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 509
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 6688
	members = 63
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 912
	members = 3
	query_queue = 0
	query_time = 1

Operating system and Environment details

CentOS Linux release 7.7.1908 (Core)

Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Log Fragments

Log from Consul Clients are fine but logs from Consul Servers are being filled with agent.router: Non-server in server-only area: non_server=<MULTIPLE_AGENTS_HOSTNAMES> area=lan for each consul client in the cluster.

@pierresouchay
Copy link
Contributor

We have the same issue

pierresouchay added a commit to pierresouchay/consul that referenced this issue Sep 15, 2020
When calling `GetDatacentersByDistance()` or `GetDatacentersMap()`, an
incorrect condition was used to diplay log message, thus flooding
Consul's logs.

Example of message:

```
  [WARN] agent.router: Non-server in server-only area: non_server=myClientNode area=lan
```

This message is only valid for WAN areas, filter to avoid creating
hundreds of logs/s on our clusters, each time someone is calling this
method.

Our logs were flooded by such messages when migrating our Consul servers
from 1.7.7 to 1.8.4.

This will issue fix hashicorp#8663
@pierresouchay
Copy link
Contributor

Should be fixed by #8685

pierresouchay added a commit to criteo-forks/consul that referenced this issue Sep 15, 2020
When calling `GetDatacentersByDistance()` or `GetDatacentersMap()`, an
incorrect condition was used to diplay log message, thus flooding
Consul's logs.

Example of message:

```
  [WARN] agent.router: Non-server in server-only area: non_server=myClientNode area=lan
```

This message is only valid for WAN areas, filter to avoid creating
hundreds of logs/s on our clusters, each time someone is calling this
method.

Our logs were flooded by such messages when migrating our Consul servers
from 1.7.7 to 1.8.4.

This will issue fix hashicorp#8663
@pierresouchay
Copy link
Contributor

We applied patch #8685 on our preprod, the issue is gone

@danlsgiga
Copy link
Author

Thanks @pierresouchay, this bug is very annoying as its flooding our logs and wasting resources on our ELK stack... average of 10k messages every 30 min just for this bug!

@pierresouchay
Copy link
Contributor

@danlsgiga same for us, but we stopped before deploying to prod :)

@danlsgiga
Copy link
Author

@danlsgiga same for us, but we stopped before deploying to prod :)

ha, same here... its isolated to our dev environment... 😅

@danlsgiga danlsgiga changed the title Consul Server filling logs with WARN message Consul Server logs flooded with WARN message Sep 15, 2020
@dnephin dnephin added the type/bug Feature does not function as expected label Sep 15, 2020
@analytically
Copy link

This merits a minor release IMHO.

@danlsgiga
Copy link
Author

This merits a minor release IMHO.

Agreed, I've been checking daily for it but I think folks are busy with HashiConf :)

criteoconsul pushed a commit to criteo-forks/consul that referenced this issue Oct 23, 2020
When calling `GetDatacentersByDistance()` or `GetDatacentersMap()`, an
incorrect condition was used to diplay log message, thus flooding
Consul's logs.

Example of message:

```
  [WARN] agent.router: Non-server in server-only area: non_server=myClientNode area=lan
```

This message is only valid for WAN areas, filter to avoid creating
hundreds of logs/s on our clusters, each time someone is calling this
method.

Our logs were flooded by such messages when migrating our Consul servers
from 1.7.7 to 1.8.4.

This will issue fix hashicorp#8663
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants