-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pool4 is underperforming #214
Comments
New status: Before getting to profiling I wanted to make sure I had a clear starting point from where to start optimizing so I tried extracting more data. I wound up with a different opinion on what's going on. Here's the data. (See the README for details.) From the fact that the "ip6tables full" curve stays stubbornly below "pool4 full", it looks like pool4 is actually less critical than ip6tables on performance. (At least when populated with 1500 rows.) That is, I know the pool4 entry lookup can be optimized, but I don't think this will speed up translation as much as Pier hopes. (@pierky: Do you get different results than me if you prolong the tests through several |
Added a new by-address index so there's no need to iterate sequentially over anything now. The new pool4 is unit-tested and looking stable, but there are two TODOs preventing the code from being usable: 1. The API changed a little and I still need to tweak the callers. 2. I can't use RCU anymore and all the locking code is commented out. Progress on #214.
Was missing: - Locks. - Fall back to use interface addresses when pool4 is empty. - Fix the API users. - A flush unit test; --flush was crashing pool4. Looks stable, but I've only run unit and informal integration tests. Fixes #214.
I performed some tests using the current test branch (ba4e7db); I used the same hardware and scenario of my previous tests (100 Mbps NIC, see #175), but I introduced the mark-randomizer module. Tests have been performed using 20 cycles of iperf.
Test 2 and 3 brought CPU to 100% because of hardware interrupts, test n. 1 only to 52%. |
Oh wow, we were planning to report at the same time :) Before I analyse your data, I'd like to report my tests on the new code. pool4 looks a lot faster now, at least compared to ip6tables. I even got rid of those annoying waves somehow. Notice that the code was forked from the Jool 3.5 development branch, which might still not be in production status yet. |
Oh, well I'd have to run the tests again, but I can believe it. I take it that you think of that as a problem? Is pool4 getting too full? Jool selects ports based on algorithm 3 of RFC 6056. This algorithm degrades "mostly" gracefully because reserved ports tend to scatter themselves randomly across the pool4 domain, which means that when there is a collision, finding a nearby unused port is relatively fast. Until most ports are reserved that is. When pool4 is completely reserved, for example, the processor will waste a lot of time looping through the whole pool4 domain looking for an unused port. This is an approximate representation of how port selection should degrade as the number of reserved bindings reach the limit imposed by pool4: Maybe this can be optimized, too. |
I'll run new tests too: my last ones used a short range of ports per pool4 entry (~ 30), so collisions may have negatively impaired performances. I'll try with more reserved ports per entry. |
We might be looking at a different problem than the port selection peak, actually.
Edit: Actually, scratch that idea. If you ran 20 iperf calls, then each client is only using 20 out of the 30 pool4 addresses. Assuming the mark randomizer didn't cause clients to be mapped to marks that didn't belong to them, then you are not exhausting pool4 entries. On the other hand, is it really degrading badly? I see two sort of comparable numbers (92.7 and 35.4/35.9) but that's not enough to draw a curve. |
(See edits above) Questions:
|
Actually I have not a real target client count, I just wanted to see how better the new code was performing and, as you also said, it's very good to see how faster it is now. Edit: the "do you have similar results" was not related to CPU usage, I just wondered if you had same performance improvement with the new code, sorry, my fault. |
Oh, ok. Thank you. :) Then I guess that's it for the moment. I'll go back to tweaking the other issues. |
So I noticed the other day that the new bottleneck (ip6tables accesing thousands of entries sequentially) can also be addressed by using a particular variation of the MARK target. The basic idea is, we might not be able to prevent ip6tables from walking through the whole database, but we can condense several ip6tables entries into one. The following graph shows the Mbits/sec that I gained by swapping 2048 Here are the details of the experiment. |
So, very good results here using the brand new I performed some measurements in the following scenario:
NAT64 is always the same hardware, this time it's connected via GbE to other two hosts. Using the old configuration (6400 jool/ip6tables rules, without In the new configuration, I used the same branch of Jool I've already used in my previous test (ba4e7db), this time with 6656 pool4 entries ( Sender's IPv6 address falls in the last ip6tables' Using iperf from
These are the same values I obtained using iperf between |
Does this mean that translation adds no overhead whatsoever? I find this a little too impressive (as in "worrying") |
I find it odd too, this is why I wanted to report it here. I had only little time to run these tests and I spent most of it setting up the new scenario with the two GbE-enabled hosts ( Unfortunately it only remains to wait until the next week when I will spend some time on it again; now that the lab is already up I'll have more time for the real measurements. First of all I'll double check every step again and I'll capture and write down all the metrics aforementioned. In the meanwhile any suggestion or hint to dispel this doubt will be very well appreciated. :-) |
Well, I'm guessing it'll most likely not add much valuable insight, but maybe the BIB/session tables can also be queried to validate Jool is actively doing what we're expecting it to. |
I forgot to tell, but I also checked the Yes, I feel like I'm missing something very big, maybe I'm not seeing the forest for the trees here. |
Thanks :) |
So, everything seems fine with results I got last week. Packets from
IPv6-to-IPv6 tests from IPv6-to-IPv4 tests from So, CPU3 (= v6 interface interrupts) raises from 20% to 90% but traffic keeps flowing without problems. I tried to put the ip6tables rule that matches the
|
Thank you for your efforts! I'm guessing a single client is simply unable to saturate the network now that, assuming this configuration, Jool stopped being the bottleneck. If more clients and bandwidth are added to the mini-DOS attack, the CPUs are probably going to start hobbling. Also, (But I'd say that is outside of the scope of this issue.) |
3.5 released; closing. |
Started here. If my theory is correct, the bug is not tied to
--mark
; it's tied to the amount of rows pool4 contains.Even if
--mark
is at fault, this bug should be addressed because, even though RFC7422 and--mark
are (to most purposes) roughly the same feature, and the former should naturally scale more elegantly as more clients need to be serviced, the latter is more versatile since it matches clients arbitrarily by means of iptables rules. So I don't see any reasons to drop--mark
once RFC7422 is implemented.I believe this is the symptom that needs to be addressed:
The text was updated successfully, but these errors were encountered: