-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
m3coordinator network usage disparity #949
Comments
@BertHartm Thanks for filing the issue! Could you provide a few more details, specifically:
If would expect <NETWORK_OUT> to be at least <REPLICATION_FACTOR> * <NETWORK_IN> which would explain some of the discrepancy, but their might be more going on here. |
sure, Replication Factor is 2, and I only have 1 (unaggregated) namespace. I'll be spinning up a few other configurations today, so I'll try to post more statistics as I have them. Other anecdotal stuff that may be relevant: This has more outbound traffic than when I was running this as a sidecar on the prometheus box, but it also seems to cause a lower |
Honestly now that I'm think about it this a bit more, I think it makes sense once you consider the read workload. If you're reading from this m3coordinator instance, then it is returning uncompressed data, but the data it receives from M3DB is compressed. so M3DB --> M3Coordination is compressed (counts towards received) I was only considering the write workload in my original response |
This is a write only workload though. I have a different coordinator instance set up for reads |
so more datapoints from other machines I spun up today: I have multiple prometheuses going to 1 m3coordinator going to 6 dbnodes in all cases. All 0.4.3, RF=2, and 1 namespace. In this case the prometheus instances come in pairs because they're set up HA A: 2 prometheus: M3coordinator has 4MB/s in, just under 100MB/s out before I send traffic to the coordinator, I'm seeing about 70k in, 50k out. It looks like these 3 setups have a ratio closer to 25:1 which is a bit higher than I was seeing earlier |
Hey, thanks for putting this together! We've had a quick investigation and we've found the following:
We're currently looking over ways to make some easy wins here, will keep this ticket updated |
My bad on letting this fall off for a while: Our findings at the time were that we currently must duplicate the IDs since we allow additional tags to be added, while keeping the IDs the same. We discussed a solution where we'd send through a ID which would then be interpreted as the default ID on the m3 side, but unfortunately that fell by the wayside. We'll sync up early next year to see if we can build that functionality, and plan encoding/batching coordinator metrics to try and drive the network load down |
To follow up, this PR allows us to generate IDs in a better fashion, and after some consultation with db folk, once this is approved I may go ahead and put that logic as the default when passing in nil IDs, which should fulfill point 2 at least. |
Hey, it's still planned/ongoing work, unfortunately it's not super high priority at the moment so it's kind of on the backburner. |
@arnikola Thanks, I might look at picking up one of the unaddressed points if I can (unless people have already got progress on them? I couldn't find anything related) |
Yeah nothing ongoing or planned for the near future unfortunately... If you were interested in having a look, I'd say pushing default ID generation (i.e. if |
Low-hanging fruit we identified was simply sticking LZ4 compression on the whole in-cluster protocol (including m3controller -> m3db write path). With it, we had ~92% bandwidth savings. ~constant input rate, looking at ifconfig of m3coordinator node: 3.5h 0.14.2 run - prometheus write is about 1/15 of sending output to m3db:
15h 0.14.2 + lz4 run - prometheus write is 77% of sending output to m3db:
Our design was to wrap the net.Conn:s used in tchannel, and transparently compress/decompress all of it, with following logic:
CPU impact we are not sure about yet, but at least as an option it would be very nice for us to have, because we pay by the byte for intra-AZ traffic. |
@fingon that’s really fantastic results, we would be more than happy to get your change into master - we can even do a lot of the tests, etc, happy to help out as much as we can to make it easy to get it in. |
#2079 will eventually fix this. While probably slightly less efficient than my experimental branch (that we use in prod, cough cough), the PR's version will be backward and forward compatible and use less memory than our version. |
This would be very helpful to have available, either as referenced here or in #2079 since currently the huge bandwidth usage between our coordinator-write pool -> m3db pool is our heaviest used resource. |
To improve the situation here, I had to go with the LZ4 route that @fingon did. It might be worth adding his approach in until the longer term options are completed. Just to show some actual results, we're seeing a huge drop in network traffic (no config changes for inbound metrics) |
Why was this closed? @gibbscullen it isn't still addressed. e.g. #2079 is not merged in. |
It was a part of effort to clean up stale issues. Will re-open to continue investigating. |
We have moved on since to Snappy based solution with configuration in our own use. However we can open PR about that too if it is desired. |
I moved m3coordinator to a dedicated box, so that prometheus is writing to it over the network, and then it's writing to 6 m3db instances over the network.
I would expect network traffic to be roughly equivalent, but I'm seeing about 9.5MB/s coming into the coordinator, but 165MB/s going back out which seems surprisingly disproportional.
I'm running 0.4.1 as released
The text was updated successfully, but these errors were encountered: