m3coordinator network usage disparity #949

BertHartm · 2018-09-26T18:30:39Z

I moved m3coordinator to a dedicated box, so that prometheus is writing to it over the network, and then it's writing to 6 m3db instances over the network.

I would expect network traffic to be roughly equivalent, but I'm seeing about 9.5MB/s coming into the coordinator, but 165MB/s going back out which seems surprisingly disproportional.

I'm running 0.4.1 as released

richardartoul · 2018-09-27T13:41:06Z

@BertHartm Thanks for filing the issue! Could you provide a few more details, specifically:

What is your replication factor?
What are your namespace configurations (I.E do you have more than one?)

If would expect <NETWORK_OUT> to be at least <REPLICATION_FACTOR> * <NETWORK_IN> which would explain some of the discrepancy, but their might be more going on here.

BertHartm · 2018-09-27T14:07:29Z

sure, Replication Factor is 2, and I only have 1 (unaggregated) namespace.

I'll be spinning up a few other configurations today, so I'll try to post more statistics as I have them.

Other anecdotal stuff that may be relevant: This has more outbound traffic than when I was running this as a sidecar on the prometheus box, but it also seems to cause a lower prometheus_remote_storage_dropped_samples_total, so I think it might just be better networking making it more reliable.

richardartoul · 2018-09-27T14:25:33Z

Honestly now that I'm think about it this a bit more, I think it makes sense once you consider the read workload. If you're reading from this m3coordinator instance, then it is returning uncompressed data, but the data it receives from M3DB is compressed.

so M3DB --> M3Coordination is compressed (counts towards received)
M3Coordinator --> Prometheus for reads is uncompressed (counts towards transmitted)

I was only considering the write workload in my original response

BertHartm · 2018-09-27T14:41:05Z

This is a write only workload though. I have a different coordinator instance set up for reads

BertHartm · 2018-09-27T16:21:56Z

so more datapoints from other machines I spun up today: I have multiple prometheuses going to 1 m3coordinator going to 6 dbnodes in all cases. All 0.4.3, RF=2, and 1 namespace. In this case the prometheus instances come in pairs because they're set up HA

A: 2 prometheus: M3coordinator has 4MB/s in, just under 100MB/s out
B: 4 prometheus (2 HA pairs): M3 coordinator has 5.5MB/s in, ~130MB/s out
C: 2 prometheus: 370kB/s in, 8.15MB/s out

before I send traffic to the coordinator, I'm seeing about 70k in, 50k out.

It looks like these 3 setups have a ratio closer to 25:1 which is a bit higher than I was seeing earlier

arnikola · 2018-09-27T18:20:36Z

Hey, thanks for putting this together!

We've had a quick investigation and we've found the following:

Traffic coming in from Prometheus is snappy encoded, while internal coordinator -> m3db traffic isn't, which can account for at least 2x increase, depending on how well your data gets encoded.
We currently expand tags and add them to the series ID, which will duplicate the largest part of the incoming timeseries
Incoming prom metrics are bundled together for the same series: i.e. all datapoints with the same tags come from prom in one list with one set of tags, but when coordinator sends it to the db, we have one set of tags for each datapoint.

We're currently looking over ways to make some easy wins here, will keep this ticket updated

arnikola · 2018-12-21T20:15:57Z

My bad on letting this fall off for a while:

Our findings at the time were that we currently must duplicate the IDs since we allow additional tags to be added, while keeping the IDs the same. We discussed a solution where we'd send through a ID which would then be interpreted as the default ID on the m3 side, but unfortunately that fell by the wayside. We'll sync up early next year to see if we can build that functionality, and plan encoding/batching coordinator metrics to try and drive the network load down

arnikola · 2019-01-15T19:48:47Z

To follow up, this PR allows us to generate IDs in a better fashion, and after some consultation with db folk, once this is approved I may go ahead and put that logic as the default when passing in nil IDs, which should fulfill point 2 at least.

comradedraganov · 2019-10-10T15:03:24Z

Has anyone looked into addressing this further? I'm experiencing a similar issue with a ~30x markup in network traffic in my coordinators as below (6 dbnodes, RF 3)

arnikola · 2019-10-10T21:54:41Z

Hey, it's still planned/ongoing work, unfortunately it's not super high priority at the moment so it's kind of on the backburner.

comradedraganov · 2019-10-13T21:44:47Z

@arnikola Thanks, I might look at picking up one of the unaddressed points if I can (unless people have already got progress on them? I couldn't find anything related)

yangxiangyu · 2019-10-16T03:22:28Z

same issue

arnikola · 2019-10-16T12:38:21Z

@arnikola Thanks, I might look at picking up one of the unaddressed points if I can (unless people have already got progress on them? I couldn't find anything related)

Yeah nothing ongoing or planned for the near future unfortunately... If you were interested in having a look, I'd say pushing default ID generation (i.e. if identID here is nil) down to the DBNodes if not present is probably the easiest and highest impact change (should reduce output size by close to 50%)

fingon · 2019-11-22T11:52:37Z

Low-hanging fruit we identified was simply sticking LZ4 compression on the whole in-cluster protocol (including m3controller -> m3db write path). With it, we had ~92% bandwidth savings.

~constant input rate, looking at ifconfig of m3coordinator node:

3.5h 0.14.2 run - prometheus write is about 1/15 of sending output to m3db:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        RX packets 602404  bytes 397168881 (378.7 MiB)
        TX packets 4546274  bytes 6260099487 (5.8 GiB)

15h 0.14.2 + lz4 run - prometheus write is 77% of sending output to m3db:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        RX packets 2777579  bytes 1704216244 (1.5 GiB)
        TX packets 3712787  bytes 2214657932 (2.0 GiB)

Our design was to wrap the net.Conn:s used in tchannel, and transparently compress/decompress all of it, with following logic:

gather up to 4 MB block of data to be compressed
flush earlier if:
- 5ms since last write, or
- 30ms since first write that is not sent

CPU impact we are not sure about yet, but at least as an option it would be very nice for us to have, because we pay by the byte for intra-AZ traffic.

robskillington · 2019-11-23T02:27:14Z

@fingon that’s really fantastic results, we would be more than happy to get your change into master - we can even do a lot of the tests, etc, happy to help out as much as we can to make it easy to get it in.

fingon · 2020-01-31T10:32:24Z

#2079 will eventually fix this. While probably slightly less efficient than my experimental branch (that we use in prod, cough cough), the PR's version will be backward and forward compatible and use less memory than our version.

genericgithubuser · 2020-05-19T04:12:15Z

This would be very helpful to have available, either as referenced here or in #2079 since currently the huge bandwidth usage between our coordinator-write pool -> m3db pool is our heaviest used resource.

genericgithubuser · 2020-06-17T18:37:04Z

To improve the situation here, I had to go with the LZ4 route that @fingon did. It might be worth adding his approach in until the longer term options are completed.

Just to show some actual results, we're seeing a huge drop in network traffic (no config changes for inbound metrics)

fingon · 2020-09-18T17:41:03Z

Why was this closed? @gibbscullen it isn't still addressed. e.g. #2079 is not merged in.

gibbscullen · 2020-09-18T17:43:13Z

It was a part of effort to clean up stale issues. Will re-open to continue investigating.

genericgithubuser · 2020-10-10T13:02:14Z

Since it looks like #2079 has been closed out without being implemented, would it make sense to improve things by adding @fingon 's LZ4 compression approach? Is that something that would be accepted if a MR was put in with those changes, or any thoughts on next steps for this one?

fingon · 2020-10-10T15:48:07Z

We have moved on since to Snappy based solution with configuration in our own use. However we can open PR about that too if it is desired.

arnikola self-assigned this Jan 2, 2019

gibbscullen closed this as completed Sep 18, 2020

gibbscullen reopened this Sep 18, 2020

fingon mentioned this issue Dec 22, 2020

network bandwidth so high #2726

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

m3coordinator network usage disparity #949

m3coordinator network usage disparity #949

BertHartm commented Sep 26, 2018 •

edited

Loading

richardartoul commented Sep 27, 2018

BertHartm commented Sep 27, 2018

richardartoul commented Sep 27, 2018 •

edited

Loading

BertHartm commented Sep 27, 2018

BertHartm commented Sep 27, 2018

arnikola commented Sep 27, 2018

arnikola commented Dec 21, 2018

arnikola commented Jan 15, 2019

comradedraganov commented Oct 10, 2019

arnikola commented Oct 10, 2019

comradedraganov commented Oct 13, 2019

yangxiangyu commented Oct 16, 2019

arnikola commented Oct 16, 2019

fingon commented Nov 22, 2019

robskillington commented Nov 23, 2019 •

edited

Loading

fingon commented Jan 31, 2020

genericgithubuser commented May 19, 2020

genericgithubuser commented Jun 17, 2020

fingon commented Sep 18, 2020

gibbscullen commented Sep 18, 2020

genericgithubuser commented Oct 10, 2020

fingon commented Oct 10, 2020

m3coordinator network usage disparity #949

m3coordinator network usage disparity #949

Comments

BertHartm commented Sep 26, 2018 • edited Loading

richardartoul commented Sep 27, 2018

BertHartm commented Sep 27, 2018

richardartoul commented Sep 27, 2018 • edited Loading

BertHartm commented Sep 27, 2018

BertHartm commented Sep 27, 2018

arnikola commented Sep 27, 2018

arnikola commented Dec 21, 2018

arnikola commented Jan 15, 2019

comradedraganov commented Oct 10, 2019

arnikola commented Oct 10, 2019

comradedraganov commented Oct 13, 2019

yangxiangyu commented Oct 16, 2019

arnikola commented Oct 16, 2019

fingon commented Nov 22, 2019

robskillington commented Nov 23, 2019 • edited Loading

fingon commented Jan 31, 2020

genericgithubuser commented May 19, 2020

genericgithubuser commented Jun 17, 2020

fingon commented Sep 18, 2020

gibbscullen commented Sep 18, 2020

genericgithubuser commented Oct 10, 2020

fingon commented Oct 10, 2020

BertHartm commented Sep 26, 2018 •

edited

Loading

richardartoul commented Sep 27, 2018 •

edited

Loading

robskillington commented Nov 23, 2019 •

edited

Loading