You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2018/04/17 00:21:06 [DEBUG] dns: TCP answer to [{redis.service.consul. 33 1}] too large truncated recs:=418/1308, size:=65503/204941
2018/04/17 00:21:06 [DEBUG] dns: request for name redis.service.consul. type SRV class IN (took 80.247124ms) from client 127.0.0.1:52801 (tcp)
2018/04/17 00:21:06 [DEBUG] dns: request for name redis.service.consul. type A class IN (took 7.149242ms) from client 127.0.0.1:52805 (tcp)
After 5k records
SRV ~100ms
A ~25ms
2018/04/17 00:36:00 [DEBUG] dns: request for name redis.service.consul. type SRV class IN (took 99.704139ms) from client 127.0.0.1:64822 (tcp)
2018/04/17 00:36:00 [DEBUG] dns: TCP answer to [{redis.service.consul. 1 1}] too large truncated recs:=1420/5080, size:=65510/234352
2018/04/17 00:36:00 [DEBUG] dns: request for name redis.service.consul. type A class IN (took 26.310653ms) from client 127.0.0.1:64824 (tcp)
One large part of this behavior is due to the naive method to truncate records when size is too big. Thus, we propose to switch to a binary search in order to find the optimal value.
The text was updated successfully, but these errors were encountered:
Will fixhashicorp#4036
Instead of removing one by one the entries, find the optimal
size using binary search.
For SRV records, with 5k nodes, duration of DNS lookups is
divided by 4 or more.
SRV records dropped from 100ms to 25ms (divided by 4)
A records from 25ms to 20ms (less impressive, but significant)
Example:
2018/04/17 00:40:58 [DEBUG] dns: TCP answer to [{redis.service.consul. 33 1}] too large truncated recs:=413/5080, size:=65457/804580
2018/04/17 00:40:58 [DEBUG] dns: request for name redis.service.consul. type SRV class IN (took 27.502257ms) from client 127.0.0.1:59778 (tcp)
2018/04/17 00:40:58 [DEBUG] dns: TCP answer to [{redis.service.consul. 1 1}] too large truncated recs:=709/5080, size:=65474/234352
2018/04/17 00:40:58 [DEBUG] dns: request for name redis.service.consul. type A class IN (took 20.231774ms) from client 127.0.0.1:59780 (tcp)
Now that #3948 has been merged, TCP DNS queries do not crash when too many services are present.
However, the DNS performance is still very bad when many nodes are registered since it increase dramatically with the number of nodes.
Here is a comment that explain how to test it quickly: #3850 (comment)
After a few records, here are the results on my laptop in consul agent -dev mode:
Around 1300 records
SRV ~80ms
A ~7ms
After 5k records
SRV ~100ms
A ~25ms
One large part of this behavior is due to the naive method to truncate records when size is too big. Thus, we propose to switch to a binary search in order to find the optimal value.
The text was updated successfully, but these errors were encountered: