-
-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible inaccuracies on (linux) bond interfaces #136
Comments
Not sure if this is due to averaging? if the traffic is bursty our averaging calculation will make the numbers go wild. Looking at these screenshots, it seems that we are always reporting a number lower than other tools, which is consistent with my hypothesis... |
That was my guess too. The person on twitter wasn't willing to participate in an issue here, so I think to just leave this open in case someone else encounters this issue and decides to be helpful. |
Hey @amyspark, thanks for bringing this up. I hope you're okay with helping us debug this a little? To start, a few questions to help identify a possible cause for this:
|
@imsnif ,
|
Thanks @amyspark. I want to try to troubleshoot this to find out which part of the app is misbehaving. If sometime in the next few days I give you (either a branch to compile or a compiled binary - whichever is more comfortable for you) would you be willing to run it? I essentially just want to cut out all parts of the app except for the traffic sniffer and see if it reports the total traffic correctly in your system. |
Sure @imsnif ! Let me know which branch and I'll test it right away. |
So, this came up again in #155, and I'd like to try to get back and find the root cause for this. I still cannot reproduce this locally, but @TheLostLambda encountered this issue with large volumes of data. My first guess is that this is somehow related to an issue with the ip payload length reporting in libpnet. I made a branch that measures the size of the Ethernet frame rather than the ip packet to see if this direction might be promising. @amyspark - I know it's been a while, my apologies for not getting to this, but if you'd be willing to checkout the @TheLostLambda - if you'd like to give it a go as well, that would also be great. In case anyone else would like to try as well, this is how I run bandwhich locally after I checked out the There will probably be a little more back and forth, as if this is not the issue, I have some other guesses and things I'd like to look into. Thanks in advance for bearing with me. :) |
Hi! I've given this a go and, unfortunately, the same issue persists. Bandwhich reporting ~4MBps when I'm pulling ~25MBps. Thanks so much for helping with the debugging! |
Hrm... How about if you start bandwhich (in this branch or normally) with the If that still happens, I'll update the debug branch and try to remove everything except the packet sniffer and some text that updates the total bandwidth on screen. Try to peel away as much stuff as possible. |
No luck unfortunately, with either my wifi interface or a virtual docker interface |
Hum... really more of a hunch than anything substantial, but how about if you kill the docker daemon, restart the interface and then try? |
A good call, but killing docker and deleting the virtual interface didn't resolve the issue. The error does seem speed dependent though. There is less error if the connection is slower, even for the same size of file. |
Ah well, we had to try. :) |
Sounds excellent! Thank you again for all of your hard work! |
@imsnif in my testing, upload now matches bmon's output. Download is constantly 0 (which is obviously not). |
Hum @amyspark, thanks for sticking around - that's quite odd! Alright, I updated the debug branch with a commit that comments out most of the app, leaving just the network sniffer threads and the keyboard input (so that quitting with ctrl-c or q would be possible). This would now show nothing on the screen, and upon quitting would dump the total and bandwidth to the screen. The bandwidth it shows would be an average per second of everything that happened ever since the app was started. Could you two please give this a try and see if the reporting is more on the mark for you now? |
Unfortunately it's still acting up... I've been downloading a 1GB file with this command: wget -O /dev/null http://ipv4.download.thinkbroadband.com/1GB.zip And wget reports around 20MBps, but this is what a got from bandwhich:
So the rate is still a bit slow and the total is unfortunately 800MB off :( Let me know if there is any more testing I can help with! |
Aha! This is actually good news, because it rules out most of the app. :) The problem is probably somewhere in the network sniffer. So, I updated the branch with a version that doesn't even parse the packets, but just counts the raw ethernet frames on the wire. This will mix up and down speed, so you'll only see download (which is a sum of both), but since we're way off accuracy in your cases, I think we'd be able to tell if this made a difference nonetheless. Could you please check? (also, preferably with the |
Thanks for all of the continued help! I've given things a go with the same wget command as last time (without around the same speed from wget) and get this back from
Still off unfortunately, but it's good that we are narrowing things down a little! |
@imsnif Tested it with Netflix's fast.com and it now matches my expected bandwidth (~400KB/s). |
For completeness, I've run my test again with fast.com and fast.com gives 180Mbps (so 22.5MBps) and I'm only getting 4.31MBps from bandwhich. Limiting my speed to 400KBps, I still only get 113KBps from bandwhich |
Alright, we're getting somewhere. Seems like we have two different issues here. @amyspark - I created a new branch called @TheLostLambda - I made another try in |
Mostly the same sort of results in my branch unfortunately. |
Thanks @TheLostLambda - I have some more ideas, I'll shoot some changes your way in the next few days if I haven't tired you out already . :) |
No worries! I'm grateful for all of the help! |
I got led on a wild ride, but I think I squashed this bug in #157 (more detail in that PR) Please let me know if the Additionally, out of curiosity, could both @imsnif and @amyspark post the output of this command: ethtool -k <interface> | grep offload Where |
@TheLostLambda -- your branch is still wildly off my current bandwidth usage.
|
Hmm, definitely the one with pcap? Could you try running (as root): # ethtool -K wlp4s0 tx off rx off gso off tso off gro off lro off And checking if the usage is correct then? It would also be good if you could send |
Hah! That's really interesting. Great work tracking this down @TheLostLambda!!
I suspect @amyspark's issue comes from somewhere else given that counting the raw bytes was accurate. I would love to check it further. Tbh, I would be okay with keeping the backend as is and having the offloading shutdown as a documented troubleshooting stage for bandwhich. I suspect most people would not mind this as a solution, and if someone comes along that does, we can consider issuing this fix. I'm just a little weary of bugs such a deep infrastructure change can introduce. What do you think? |
Hmm, I don't know. I really think it would be better for Bandwhich to handle offloading correctly out of the box, as many other applications seem to do. Additionally, the offloading features are there to increase network performance, so toggling them off does have an adverse effect on high-speed networking. Personally, I trust pcap to be reliable, as it is the backend for Wireshark, tcpdump, and other very widespread programs. I can certainly understand the hesitancy to change out the backend, but it does manage to solve my problem on every one of my machines I've tested so far (currently 3) and I suspect that the number of people running into this problem will only increase as NICs become more advanced. I'd still like to see the swap to pcap making it in to master, but I suppose I can always maintain a fork if need be. |
I've got a new, less disruptive fix in #158 :) |
Hey @amyspark - I would be curious to see if @TheLostLambda's fix addresses your issue as well. I just merged it to master (haven't released it yet). If you have the time to check, that would be great. Otherwise I'd be happy to get to the bottom of the issue you're experiencing as well. Thanks! |
@TheLostLambda about the pcap issue, upload rate looks OK with the offload disabled. This is the
@imsnif -- your branch looks OK with @TheLostLambda's |
@amyspark - we just released a new version (0.13.0) - could you try with it and see if it works for you? |
@imsnif - no, 0.13.0 consistently underestimates (by a half or more) current traffic. Download rate is still locked at 0. |
Someone reported this on twitter: https://twitter.com/LinuxReviews/status/1218547448928444418. I don't have a lot more details unfortunately.
The text was updated successfully, but these errors were encountered: