-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing the behavior of small queue building non-ecn'd flows #148
Comments
maybe we can flow-dissect arp? |
Is there an argument for lowering the default interval for when flent calls irtt? The default irtt packet length with no payload is 60 bytes, so here are bitrates at various intervals for IPv4+Ethernet (106 byte frames, and tripled for RRUL's three UDP flows):
50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed. Bitrates could also be lowered by ~15% (16 bytes per packet) by passing in I'd also like to see packet loss (up vs down separately) shown by default, somehow. :) |
On Sat, Aug 25, 2018 at 10:54 AM Pete Heist ***@***.***> wrote:
Is there an argument for lowering the default interval for when flent
calls irtt?
The default irtt packet length with no payload is 60 bytes, so here are
bitrates at various intervals for IPv4+Ethernet (106 byte frames, and
tripled for RRUL's three UDP flows):
200ms => 4.2 Kbit * 3 = 12.7 Kbit
100ms => 8.5 Kbit * 3 = 25.4 Kbit
50ms => 17.0 Kbit * 3 = 51 Kbit
20ms => 42.4 Kbit * 3 = 127.2 Kbit
10ms => 84.8 Kbit * 3 = 254.4 Kbit
50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of
bandwidth threshold is crossed.
Bitrates could also be lowered by ~15% (16 bytes per packet) by passing in
--tstamp=midpoint and sacrificing the server processing time stat.
I'd also like to see packet loss (up vs down separately) shown by default,
somehow. :)
What I'm proposing here is in part, "rrul_v2". My original rrul spec
specified 20ms intervals for the isochronous
flows. Originally, incidentally, voip ran at 10ms intervals, but that got
relaxed due to then practical limits.
I wouldn't mind having an even more aggressive test that did 2.7ms
intervals, which is as low as opus can go.
And lest you think I'm being extreme... when I was a kid... switched
telephony was measured in lightspeed - a call across town felt and acted
the same as wispering in your lover's ear - which I did
20ms is the equivalent of having a conversation 20 feet across the room.
I've always thought 200ms sampling was wayyyyy too high (see nyquist) and
200usec, about right. :)
As for packet overhead... well, I'm pretty sure irtt is less than opus
already
As for 1mbit overheads... well... I care more at wifi and speeds much
greater than 5mbit nowadays.
—
… You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#148 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AerUv__DiCu3yI8_vq69oU0d1UljymgCks5uUY8UgaJpZM4WCoe6>
.
_______________________________________________
Flent-users mailing list
***@***.***
http://flent.org/mailman/listinfo/flent-users_flent.org
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
|
and I'd make the new tests depend on irtt specifically, because they are
different from ping.
we could, I guess, make the tighter irtt emission and sampling behavior
global with a -v2 or -v1 parameter to flent, (-v2 becoming the default and
breaking with an error message when irtt was not present) - and keeping the
test names as is.
|
Yeah, I haven't thought about what it would mean to change the semantics of the existing tests by changing the default interval. Although, as it is, there's still a fallback to UDP_RR for current tests, so results can change if irtt isn't installed or the server isn't reachable for some reason. I'm fine with a 20ms default interval, but that could affect folks testing on lower rate ADSL. Sub 10ms intervals means 2.7ms intervals should be no problem. 200µs still functions on decent hardware, but below that isn't much good. There can be any number of things that could cause those kinds of latencies.
|
Hi Pete,
On Aug 25, 2018, at 19:53, Pete Heist ***@***.***> wrote:
Is there an argument for lowering the default interval for when flent calls irtt?
The default irtt packet length with no payload is 60 bytes, so here are bitrates at various intervals for IPv4+Ethernet (106 byte frames, and tripled for RRUL's three UDP flows):
200ms => 4.2 Kbit * 3 = 12.7 Kbit
100ms => 8.5 Kbit * 3 = 25.4 Kbit
50ms => 17.0 Kbit * 3 = 51 Kbit
20ms => 42.4 Kbit * 3 = 127.2 Kbit
10ms => 84.8 Kbit * 3 = 254.4 Kbit
50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.
Would it be possible to simply also show this bandwidth use in the bandwidth plots (say as a single data series accumulated over all irtt probes), then the user could select the interval and still be able to easily assess the effects on other bandwidth flows? I believe that would also be great for the netperf UDP/ICMP streams, but probably harder to implement
Best Regards
Sebastian
…
Bitrates could also be lowered by ~15% (16 bytes per packet) by passing in --tstamp=midpoint and sacrificing the server processing time stat.
I'd also like to see packet loss (up vs down separately) shown by default, somehow. :)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
_______________________________________________
Flent-users mailing list
***@***.***
http://flent.org/mailman/listinfo/flent-users_flent.org
|
On Aug 27, 2018, at 1:40 PM, flent-users ***@***.***> wrote:
Hi Pete,
> On Aug 25, 2018, at 19:53, Pete Heist ***@***.***> wrote:
>
> 50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.
Would it be possible to simply also show this bandwidth use in the bandwidth plots (say as a single data series accumulated over all irtt probes), then the user could select the interval and still be able to easily assess the effects on other bandwidth flows? I believe that would also be great for the netperf UDP/ICMP streams, but probably harder to implement
We could, but I would think what might happen is that in most cases you’ll have a line that appears close to 0 Mbit, relative to other flows, and we probably wouldn’t want to change the scaling for the scaled bandwidth plots to accommodate that...
Pete
|
Hi Pete,
On Aug 27, 2018, at 14:14, Pete Heist ***@***.***> wrote:
> On Aug 27, 2018, at 1:40 PM, flent-users ***@***.***> wrote:
>
> Hi Pete,
>
> > On Aug 25, 2018, at 19:53, Pete Heist ***@***.***> wrote:
> >
> > 50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.
>
> Would it be possible to simply also show this bandwidth use in the bandwidth plots (say as a single data series accumulated over all irtt probes), then the user could select the interval and still be able to easily assess the effects on other bandwidth flows? I believe that would also be great for the netperf UDP/ICMP streams, but probably harder to implement
We could, but I would think what might happen is that in most cases you’ll have a line that appears close to 0 Mbit, relative to other flows, and we probably wouldn’t want to change the scaling for the scaled bandwidth plots to accommodate that...
Well we currently cap the max already (we do not show all individual sample values), we could do the same for the time measurement plots, so they are only revealed in non-scaled mode. I guess that is a slippery road, because the next thing to add would be the ACK traffic for each TCP stream....
I guess if it would be useful (and easy), Toke would have added it already ;)
Best Regards
Sebastian
…
Pete
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
_______________________________________________
Flent-users mailing list
***@***.***
http://flent.org/mailman/listinfo/flent-users_flent.org
|
Pete Heist <[email protected]> writes:
> On Aug 27, 2018, at 1:40 PM, flent-users ***@***.***> wrote:
>
> Hi Pete,
>
> > On Aug 25, 2018, at 19:53, Pete Heist ***@***.***> wrote:
> >
> > 50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.
>
> Would it be possible to simply also show this bandwidth use in the
> bandwidth plots (say as a single data series accumulated over all
> irtt probes), then the user could select the interval and still be
> able to easily assess the effects on other bandwidth flows? I believe
> that would also be great for the netperf UDP/ICMP streams, but
> probably harder to implement
We could, but I would think what might happen is that in most cases
you’ll have a line that appears close to 0 Mbit, relative to other
flows, and we probably wouldn’t want to change the scaling for the
scaled bandwidth plots to accommodate that...
I think the important part of this is that it would be reflected in the
total bandwidth score. I've been meaning to implement this for a while
for netperf UDP_RR, because otherwise you can get spurious bandwidth
drops as latency decreases just because the latency measurement flows
take up more bandwidth. But, well, irtt sort of made that less urgent ;)
But I could revisit; I do believe I already capture the data from irtt
(I think?).
…-Toke
|
Ok, well if we do go for it, so far in irtt's JSON there's just an average |
I really do care about measuring packet loss and re-orders accurately. I've also been fiddling with setting the tos field, to do ect(0,1) and CE. Doing that at a higher level would be good and noting the result. --ecn 1,2,3 ? summary line of "Forward/backward path stripping dscp", "CE marks" |
on plotting stuff I could see adding a 4th graph much like TSDE's for loss and reorder. |
actually - and I can see pete running screaming from the room - we could add tcp-like behavior to irtt and obsolete netperf entirely except for referencing the main stack. The main reason we use netperf was because core linux devs trusted it, and the reason why we sample only is because timestamping each packet and extracting stats from it is hard in light of mss and the complexity of the netperf codebase. Implementing tcp-like behavior and tcp-like congestion controllers on top of irtt seems simpler in comparison, and we already have better timestamp facilities than tcp in irtt. Who here likes playing the Zerg as much as I do? |
As for packet loss and reorders, there's the What's TSDE? As for tcp'ish irtt, I think I need to go canicross the dog in the forest before I internalize that. :) Although I bet per-packet RTTs would be invaluable for investigating ecn?
|
Ah, I see TSDE is Pollere's work. I need to go through the talks referenced on pollere.net asap to get smarter on that. Will be on some roofs today though, p2p connection for the neighbors... |
this convo is (purposefully) all over the place, but I'm leaning towards a rrul_v2 test with 10ms irtt intervals. Not clear to me if flent could deal with two different sample rates. Also perhaps an IRTT_REQUIRE flag --te=irrt=1 |
Another rrul_v2 issue would be to correctly end up in all the queues on wifi. |
Hi Dave,
On Sep 3, 2018, at 18:00, Dave Täht ***@***.***> wrote:
vnv
Another rrul_v2 issue would be to correctly end up in all the queues on wifi.
So in theory rrul_cs8 should do that... (it aims to just use one dscp-marked flow per class selector for a total of 8 tcp flows per direction...) In practice I believe the mapping from dscps to ACs is highly non-linear...
Best Regards
Sebastian
…
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
_______________________________________________
Flent-users mailing list
***@***.***
http://flent.org/mailman/listinfo/flent-users_flent.org
|
flent-users <[email protected]> writes:
Hi Dave,
> On Sep 3, 2018, at 18:00, Dave Täht ***@***.***> wrote:
> vnv
> Another rrul_v2 issue would be to correctly end up in all the queues on wifi.
So in theory rrul_cs8 should do that... (it aims to just use one
dscp-marked flow per class selector for a total of 8 tcp flows per
direction...) In practice I believe the mapping from dscps to ACs is
highly non-linear...
Well, not that non-linear:
const int ieee802_1d_to_ac[8] = {
IEEE80211_AC_BE,
IEEE80211_AC_BK,
IEEE80211_AC_BK,
IEEE80211_AC_BE,
IEEE80211_AC_VI,
IEEE80211_AC_VI,
IEEE80211_AC_VO,
IEEE80211_AC_VO
};
|
HI Toke,
On Sep 3, 2018, at 19:37, Toke Høiland-Jørgensen ***@***.***> wrote:
flent-users ***@***.***> writes:
> Hi Dave,
>
>
>> On Sep 3, 2018, at 18:00, Dave Täht ***@***.***> wrote:
>> vnv
>> Another rrul_v2 issue would be to correctly end up in all the queues on wifi.
>
> So in theory rrul_cs8 should do that... (it aims to just use one
> dscp-marked flow per class selector for a total of 8 tcp flows per
> direction...) In practice I believe the mapping from dscps to ACs is
> highly non-linear...
Well, not that non-linear:
const int ieee802_1d_to_ac[8] = {
IEEE80211_AC_BE,
IEEE80211_AC_BK,
IEEE80211_AC_BK,
IEEE80211_AC_BE,
IEEE80211_AC_VI,
IEEE80211_AC_VI,
IEEE80211_AC_VO,
IEEE80211_AC_VO
};
Well, aren't these values according to IEEE P802.1p (https://en.wikipedia.org/wiki/IEEE_P802.1p)?
PCP value Priority Acronym Traffic types
1 0 (lowest) BK Background
0 1 (default) BE Best effort
2 2 EE Excellent effort
3 3 CA Critical applications
4 4 VI Video, < 100 ms latency and jitter
5 5 VO Voice, < 10 ms latency and jitter
6 6 IC Internetwork control
7 7 (highest) NC Network control
These map the 3 bit priority PCP values from VLAN tags to ACs, but note the dance with PCP 1 being lower than PCP 0, and more importantly the different interpretations about PCP 2, is it "excellent effort" or another BK code point?
I guess the point I wanted to make is that mapping down from the 6bit DSCP to ACs is not very intuitive (with linear mapping being the "most intuitive").
Anyway, I am totally fine with just using 3 bits, this is still plenty for priority hierarchies that I can still understand ;)
Best Regards
Sebastian
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
_______________________________________________
Flent-users mailing list
***@***.***
http://flent.org/mailman/listinfo/flent-users_flent.org
|
flent-users <[email protected]> writes:
HI Toke,
> On Sep 3, 2018, at 19:37, Toke Høiland-Jørgensen ***@***.***> wrote:
>
> flent-users ***@***.***> writes:
>
> > Hi Dave,
> >
> >
> >> On Sep 3, 2018, at 18:00, Dave Täht ***@***.***> wrote:
> >> vnv
> >> Another rrul_v2 issue would be to correctly end up in all the queues on wifi.
> >
> > So in theory rrul_cs8 should do that... (it aims to just use one
> > dscp-marked flow per class selector for a total of 8 tcp flows per
> > direction...) In practice I believe the mapping from dscps to ACs is
> > highly non-linear...
>
> Well, not that non-linear:
>
> const int ieee802_1d_to_ac[8] = {
> IEEE80211_AC_BE,
> IEEE80211_AC_BK,
> IEEE80211_AC_BK,
> IEEE80211_AC_BE,
> IEEE80211_AC_VI,
> IEEE80211_AC_VI,
> IEEE80211_AC_VO,
> IEEE80211_AC_VO
> };
Well, aren't these values according to IEEE P802.1p (https://en.wikipedia.org/wiki/IEEE_P802.1p)?
PCP value Priority Acronym Traffic types
1 0 (lowest) BK Background
0 1 (default) BE Best effort
2 2 EE Excellent effort
3 3 CA Critical applications
4 4 VI Video, < 100 ms latency and jitter
5 5 VO Voice, < 10 ms latency and jitter
6 6 IC Internetwork control
7 7 (highest) NC Network control
These map the 3 bit priority PCP values from VLAN tags to ACs, but
note the dance with PCP 1 being lower than PCP 0, and more importantly
the different interpretations about PCP 2, is it "excellent effort" or
another BK code point?
I guess the point I wanted to make is that mapping down from the 6bit
DSCP to ACs is not very intuitive (with linear mapping being the "most
intuitive").
Anyway, I am totally fine with just using 3 bits, this is still plenty
for priority hierarchies that I can still understand ;)
Oh, it's absolutely a mess. So much so that the IETF had to write a
whole RFC on it: https://tools.ietf.org/html/rfc8325
…-Toke
|
@chromi @heistp @jg @richb-hanover
Our tests with typical sampling rates in the 200ms range are misleading. We (until the development of irtt) are basically pitting request/response traffic against heavy tcp traffic and I think it's been leading us to draw some conclusions that are untrue for many other kinds of traffic, particularly with ecn enabled and the collateral damage it might cause.
The keruffle over: systemd/systemd#9748 and systemd/systemd#9725 is a symptom of my uneasyness.
I'm probably the only one that runs flent against a 20ms sampling interval regularly. Queues do build in this case, finally, for voip like traffic, and we end up in the "slow" queue, even the "fast" queue gets more than one packet to deliver.
Having to prioritize arp slightly as cake does in diffserv mode is one symptom, having to (as I've done for years now) ecn mark babel packets on a congested router is another. Other routing protocols that don't use IP will also always end up in a fixed queue.
In an ecn'd world, I've long thought a "special" "1025th" queue for things like arp were possibly needed. Right now that maps to the "0th" queue and can collide. There are other protocols not handled by the flow dissector.
tracking packet loss better for the measurement flows would comfort me A LOT (having a graph mixin that could pull that data out?)
a rrul_v2 test that did the 20ms irtt thing always would be good
a test that tested ecn'd flows vs non-ecn'd flows would be good.
a fixed rate, non-ecned, but queue building flow mixin (sort of like what babel does to me now). Toke picks on me for using babel on workloads like this, I view it as a subtle reminder real networks are not like a lab.
syn repeats?
RTO tracking?
A heavy flows going and squarewave tests
I was also regularly able in the latest string of extreme tests get some, out of the hundred flows started simultaneously - at 100mbit - to wind up in ecn fallback mode for some.
Using "flows 32" for fq_codel & ecn was often "not good" from the perspective of my (non-ecned) monitoring flow, things like "top" would have their output pause half screened.
The text was updated successfully, but these errors were encountered: