-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAT puncturing infrastructure #2754
Comments
Master Thesis work on the NATLab can be found in the following repositories:
This comment will be updated to contain quick links to the various parts of my master project. These same links can be found on the metaproject repository README. |
I am the guy in #2623. I will give you a independent walker, no dependency on dispersy and tribler later. My email address is [email protected], send me an email if you want |
@synctext Pushed my current Ansible code to Develop branch. This code runs for me on my staging server. The basic structure is there, and I'm able to create multiple VMs with it if I specify them in the host_var file for the server. I now have a thorough understanding of how Ansible works, and will be extending this code the coming weeks. The next goal is to auto provision the nodes with users and ssh keys. I found out it is possible from ansible to take someone's public key from Github and put it directly into auth_keys. We have to discuss if we want to forward the SSH ports of the VMs to the outside, or if we want to only allow tunneling through our main server. Both methods have pros and cons. About the NATLab server: I have updated the server to version 4.4, it looks nice. I was able to powercycle the NATBoxes, and I tried to connected to 4 of them by manually setting up tunnels. Of the 4 boxes, I was able to connect to 2 of them, 1 was dead, and 1 didn't let me log in. Tomorrow I'm going to check the rest of the boxes, and see if I can reset the boxes that have issues. |
the fun stuff: https://github.com/remkonaber/natlab-ansible/blob/develop/roles/natnode_create/tasks/main.yml
Tasks for coming weeks:
The fun part in future is still: open 1000 listen ports on two sides, brute force a connection across a symetrical NAT box. So get hardware+firmware setting to put in this nasty mode.
|
@synctext The branch develop is updated with current code. The NATnodes are completely setup (see screenshot below), the Ansible code to create the nodes, and remove them is working good. Initial provisioning is working as well, more will be added as needed: I read up on Jenkins, created my own Jenkins install + Jenkins slave on my staging server. I created a simple "Hello World" pipeline, and let it run on the slave: I also spent time on Gumby. Created my own test experiment, and tested the dummy experiment (Hello World style). Figured out how to run it locally, how to run it on 1 remote. Documentation for Gumby is severely lacking, so this took me quite some time. However, it still seems the best choice. The integration of the Jenkins + Slave >> Gumby + NATNodes isn't done yet. This consists of 2 parts that still have to come together:
|
Gumby has multiple semi-identical Bash scripts for each experiment which configure servers. Ansible is specifically designed for getting a server ready for Dispersy or Tribler. Thus Gumby Each LXC container controlling 1 NAT box and having 1 IPv4 is a clean Ubuntu 16.04. Thus needs Python + all other stuff installed. This thesis uses a wipe-upon-completion security policy. Thesis material: provisioning pipeline, password-less remote login, jenkins remote login, remote access to web interface of NAT boxes, ssh tunneling, etc. |
Coming to install more routers tomorrow, 7 out of 31 have arrived. Hopefully this number will increase, but there seems to be a slight issue in the supply line. I will prepare the 4 new shelves, and the 7 routers that have arrived. The rest will be installed at a later date. I believe I was able to get part of the dummy/remote experiment of Gumby working using 3 nodes of the NATLab. Gumby was able to copy the workspaces to the other nodes, then starting to run the dummy sleep commands on each node. The commands ran, but then it hang for unknown reasons. At first I thought it was waiting on the tracker_cmd, but I tried removing it and I get the same hangup. I think have narrowed it down to an error that has to do with the locale settings in the LXC container while setting up a virtualenv, for some reason locale isn't set correctly, looking now for a way to fix that when I spin up the VM in the first place. I now need a real project to test Gumby with, to understand Gumby better. I had a look to in Tribler Jenkins to see if I could find a project I could run, but I just couldn't find anything simple enough to test with. I figure I'll create my own Hello World project that outputs something simple to understand learn Gumby step by step. This could then also serve as an example of how to run an experiment on the NATLab servers. Later this week I'll setup a Jenkins as a VM on the NATLab. That will give me a playground to test out Jenkins, and to see if I can get Gumby running from Jenkins aswell. Its basically the manual step I took to install gumby now with git cloning, and then starting the dummy/remote experiment, but then in Jenkins pipeline form. But I first need to a successful run with a Gumby run that actually completes. |
The prior work from Cornell from 2005: We used the client to test a diverse set of sixteen NATs in the lab |
Total amount of NAT boxes today: 22 running + 2 others + 8 new = 32 total New racks are installed at our building. PC with 64GByte+16-cores for LXC containers has no Internet yet. Next target, copied from above: |
For setting up a Jenkins server, I use this role, no need to re-invent the wheel :) @remkonaber could you please tell me which Ansible script you particularly need or are interested in? I'm a bit hesitant with giving you access to all scripts since they contain sensitive information like passwords and filtering these out takes some time due to the large amount of scripts we have. |
@devos50 Thanks, I'll have a look at that role. My main reason to set up Jenkins myself is just to learn about it. Having examples in galaxy is handy, but there many, so it is handy to know what you prefer yourself. About your other Ansible scripts, I don't need access to those scripts, this was just a thought of Johan. Ansible is pretty straight forward once you get the hang of it, and modules are documented nicely. There are also plenty of examples to be found online and/or in galaxy. Thanks anyways. Now for some photos of moving the NATlab. First we had to pack everything into a car: Here we have the NATlab skeleton set up, server+kvm+switches and the shelves installed: |
@remkonaber I'd like to run my DHT with integrated NAT puncturing on your setup. How can I do this? |
@egbertbouman Right now, this isn't possible. Unfortunately it has taken many months before there was a working internet connection available for the NATlab server. Two days ago I finally got the word that there is an outlet available. So now the next step is to finish the NATlab setup, setting up all routers, the vlan-switch, and making sure they are all set correctly. I hope to get this done before the end of June. On the software front, it is setting up the NATlab as a part of Jenkins. Then it becomes possible to make a pipeline that uses Gumby on a control node to run software over every other node, let Gumby collect the data, and then Jenkins collect that again. I'll going to make a pipeline to test the setup, that should serve as example of how to run code. I'll update progress here, and will let you know when the NATlab is set up. |
@remkonaber OK, sounds good. Thanks! |
@egbertbouman Just an update. It seems the network connection still wasn't available. Stephen is trying to figure out what has gone wrong now. In the mean while, I have setup power supplies, Vlan switch, 20 routers, done initial router configuration: Stephan couldn't give me an ETA on when the network might be up, so I'm waiting to hear from him again. |
Great to hear that there has been solid progress, hopefully the network
connection will get sorted out. Thanks for the update!
…On Fri, Jul 13, 2018 at 1:33 PM, Remko Naber ***@***.***> wrote:
@egbertbouman <https://github.com/egbertbouman> Just an update. It seems
the network connection still wasn't available. Stephen is trying to figure
out what has gone wrong now. In the mean while, I have setup power
supplies, Vlan switch, 20 routers, done initial router configuration:
[image: natlab]
<https://user-images.githubusercontent.com/16024020/42689166-884188d4-869f-11e8-831a-69b931614548.png>
Stephan couldn't give me an ETA on when the network might be up, so I'm
waiting to hear from him again.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2754 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADnBpAe3j7lqYjrENI7sUZZMg2cjSd4pks5uGIVvgaJpZM4Ltfvm>
.
|
This master thesis work is expected to resume 1 March 2022. 🎉 🎉 🎉
|
another progress update phone call: upgrading code + swapping in prior efforts. run any .py on boxes. new todo: ipv8 PR testing |
Related work: https://tailscale.com/blog/how-nat-traversal-works/
|
As PC is replaced by Smartphone - we also see a shift of NAT/UDP puncturing and open point knowledge towards mobiles: btw new plan here: https://github.com/users/remkonaber/projects/6 |
A new plan indeed. With sprints of 2 weeks at a time. Those sprints have been tentatively filled in on this github project: https://github.com/users/remkonaber/projects/6 I spent the past 2 weeks (sprint) understanding the py-ipv8 code, setting up LXCs/Qemu/Proxmox locally, and setting up github stuff. I also had a chance to look at Gumby, and was pleasantly surprised it now works with py-ipv8 communities. It even works isolated with I have also created a sub-project for the Infrastructure: https://github.com/users/remkonaber/projects/8 And a sub-project for the Code that will run on the lab: https://github.com/users/remkonaber/projects/9 These are basically to-do lists I can link to issues in the various github repositories that I'll be creating for the code deliverables. The current sprint (2 weeks again), is about building my staging lab with Ansible and running some code, preferably py-ipv8 based. I'm opting to set up templates for lxc/qemu, so I can clone VMs for fast rebuilding of the lab to get fresh states each time. |
Extending the current sprint by a week. I'm simply not at the point where I can run ipv8 code on the nodes quickly. Getting everything to run on proxmox is just taking longer then I anticipated, the actually plumbing work that is, it just adds up in time. I haven't had any stumbling blocks, except for adjusting a base template scripted (With Packer Proxmox builder for example.) I'll skip that for now, and just use a recent (debian) image, adjust it by hand, and use that as a template to clone from. That should be enough to get a small test community running on the nodes. I will revisit templates later when/if needed. For now I'll focus on just getting a full stack running so I can start to work on ipv8. |
I was going to commit the v0.1.0 of my ansible code to github last week, but I just couldn't focus on it (fatigue from boostershot most likely). Today I sat down to clean up and commit the ansible code. Here it is: Had a few issues, which mostly boils down to Ansible Proxmox modules not implementing everything (correctly). I added these issues to the natlab-ansible-infrastructure/issues. I mostly worked around them for now. As for running code on the VMs: I had my local Jenkins run an agent on one of the nodes. This worked fine. However, I found out that while I can easily use a jenkins role to set up Jenkins Server on an LXC in Proxmox, automatically configuring Jenkins and then setting up nodes as slaves is slightly more convoluted. I did find some nice roles on Ansible Galaxy (from lean_delivery) that only requires me to first set up Jenkins (Ansible), set up plugins on Jenkins (Ansible), configure plugins (JCasC / Groovy), configure agents (Ansible / Groovy). Since this will still take some tinkering, I'll add this to a future sprint. So for now I'll setup some agents on my local Jenkins manually, because I want to focus on the python ipv8 code for a bit. Updated the current sprint (4) on the master project to reflect this. |
Broken URL?? https://github.com/remkonaber/natlab-ansible-infrastructure Related work: Android vunerability; Device Tracking via Linux’s New TCP Source Port Selection Algorithm Btw please explore the new master thesis format: 16 pages of thesis in IEEE format |
Not a broken link. I just forgot to change it to public last night it seems. Should work now. About master thesis format, I found the Latex template for msc theses on the the MSc Project site of Distributed Systems. However that does look different from the thesis you just linked. Is this other template available somewhere? Or should this page still be updated to include the new template? |
I spent the last sprint immersing myself in py-ipv8. Worked through the docs, and the tutorials, then dove into the ipv8 code. Since I was doing this, I figured I would look at gumby at the same time, since gumby can be used to run IPv8 overlays with the help of scenarios. First Gumby: I thought I was going to use Gumby to distribute code over the lxc nodes and use scenarios to run the experiments. In the past Gumby had the option to distribute code using ssh to nodes. However, this functionality has been removed at some point. Only running local and on DAS(5/6) is possible now. So I'm going to have to do this some other way. Since I'm planning to use Jenkins anyways, I should be ok with just using (parallel) pipelines with Jenkins to run code. Then IPv8: I began with the idea of adding more UDP ports by extending the "Endpoint" class, and adding a TraversalCommunity, and seeing how much I could do with that. I quickly ran into the question of what to do next, figuring out where state is stored, and how walkers work. So I started my first deep dive into the source of py-ipv8. I looked at the basic 4 messages in each community for peer discovery. I saw some extra bytes were sent with introduction requests, and thought: "Perhaps I could use that". However these bytes aren't sent on in followup puncture request, so this is a dead end. Plus these messages are used by every community and the bootstrap servers, so changing anything here is not a valid option. In order to do NAT testing/traversal, I'm going to have to extend the ipv8 code to store some more state, and work with multiple ports. The code dive showed me how deeply ingrained the concept of only having 1 UDP port is in the internal working of py-ipv8. Only 1 IP:port address is stored per "Peer", even for the own "Peer" only the last reported wan ip is stored. It also showed me that the walkers work with IP:port addresses instead of the concept of "Peers" (as does bootstrapping), and getting walkable addresses is getting addresses from the "Network" class. I was hoping to just subclass "Peer" and store more state there, but I'll have to extend the "Network" class as well. Then I looked at it from an overlay point of view. The basic community overlay introduces nodes to each other, this already punctures NAT in the symmetrical case, for the standard UDP port. I'll leave this port alone and do testing / traversal using fresh UDP ports that I open for the tests, this should assure that there is no prior state either in the NAT routers while running our tests, giving cleaner test results. The information that is gathered can be used to figure out the NAT type we are behind (forwarding rules/packet filter etc). Then various traversal techniques can be tried. The community overlay I will use to exchange messages between peers (directly or relayed), so that I can test every router and produce data result to turn into pictures for the various traversal techniques. Next sprint: My goal is to extend the "Endpoint", so it is possible to open extra UDP ports by the TraversalCommunity, then have peers exchange simple messages using these ports. A secondary goal is setting up a way to relay messages between two peers (in case they can't communicate directly). As usual, the master project has been updated. |
You may want to look at the The PrimaryUDPv4Address = collections.namedtuple("PrimaryUDPv4Address", ["ip", "port"])
SecondaryUDPv4Address = collections.namedtuple("SecondaryUDPv4Address", ["ip", "port"])
class PrimaryEndpoint(UDPEndpoint): # IPv4 but you can subclass from UDPv6Endpoint as well
def notify_listeners(self, packet):
super().notify_listeners((PrimaryUDPv4Address(*packet[0]), packet[1]))
class SecondaryEndpoint(UDPEndpoint):
def notify_listeners(self, packet):
super().notify_listeners((SecondaryUDPv4Address(*packet[0]), packet[1])) EDIT: In the previous version I omitted the address rebinding. Using the I started creating some documentation on walking. It's not done but it may help you out (you can find it here). |
Indeed, I want to save time by using what is already implemented! I'm only going to extend only where needed, and reuse what I can. I figure that I can subclass and only add/override what I need, without touching the working of the classes I subclass. Some thoughts based on what you wrote / my ideas that I already had:
My thought was something like One thing I'm still not sure about: Will I need to know what endpoint a packet arrived on? It could be that I'm missing something while looking at it, but
Final Thoughts:
Quinten: I'm open to any more suggestions and/or thoughts that you have. Esp. about opening multiple endpoints, and storing the information about those endpoints. R. |
Disclaimer: there is more than one way to implement these things - don't regard the following suggestions as absolute truths. I fully support your overall conclusions and I'm only giving my perspective to try and make your life easier - feel free to ignore.
Personally, if it's just 256 ports, I would use dynamic subclassing (using Subclassing
Correct. Each subclass needs a unique name. However, if you dynamically generate the subclasses this should not be an issue.
I like this idea - it seems like a good abstraction level to manage ephemeral endpoints.
Right now: yes. The IPv8 interals assume that the address class is uniquely tied to some endpoint. This is also why I suggest(ed) using subclasses.
I just wanted to highlight this to explicitly confirm (because it's important). Yes. NAT mapping happens between unique source and destinations. If you send from a different port to the same address you have a high chance of being blocked again.
The IntroductionRequest and IntroductionResponse still have bits reserved for the connection type. The past ~9 years peers have only shared "unknown" with each other though :-) Perhaps you could use that (and maybe another bit or two of the unused flag bits in the introduction logic)? Personally, I'd just edit the
That seems like a good approach if you have custom logic, regardless of whatever you choose to do (either a Overall, I feel like you're avoiding editing the |
There are indeed multiple ways to implement things. That's exactly why I said I was open to suggestions. Always nice to hear other perspectives/ideas. So, thanks. And you are correct. I was trying to avoid editing classes in py-ipv8. I started looking at my code with the assumption that I would use the library, not change it (or only make small changes). Which means subclass if you need more functionality. I was going to try to monkeypatch
As for the current introduction messages: I did notice the reserved bits, but also saw they weren't actually used. I have thought about this for a while: In an ideal situation, each peer would know what type of NAT/Firewall they are behind, then when asking for an introduction from a peer But this is a full plan. For next sprint the goal will remain: Implement the multiple UDP ports, send some messages using those ports. And also try to relay a message between two peers using a helper peer. |
Ok. I got done what I wanted to do last sprint. And I did some more, since I was looking for a way to send some test messages, and figured I might aswell make a rudimentary Walker with a For starters, I extended Adding the extra UDPEndpoints to Then I played around with payloads in its various forms, and added a RelayPayload, that had a PingPayload nested in it. I used this to relay a ping pong between two peers with the help of a 3rd. This was just proof of concept for myself to see how I could relay a payload, I'll make this more generic later on. It also gave me a chance to experiment with caches and understand those better. The only thing I haven't looked at, but what I'm also thinking about is: creating a simple DispersyBootstrapper that has the IPs of a few bootstrapper peers (part of my community) that won't be behind NAT boxes. That way I don't need to use the TUDelft bootstrapper servers, and I can change the way peers introduce themselves. I'll leave this for a future sprint as well. Next sprint (not current), I'm going to clean up the code that I wrote. And start adding the logic behind traversal testing. The current sprint my main focus is on running code with Jenkins, goal is to get my community running on 4 lxc nodes in my staging lab using a pipeline. As usual, the master project has been updated. |
Great to see this progress 💪👌 "overcoming hostile NAT-based networks using the birthday paradox", could be central thesis message. |
Previous sprint went so-so. I had some issues with my new prescription lenses I started using last week. Focusing on computer screen text simply wasn't working, so new lenses were ordered, but they didn't arrive till yesterday. However, using an old pair of glasses and big fonts I did manage to get some stuff done: Running Jenkins pipelines with pulling from GitHub is straightforward, as I figured it would be. And with web hooks it should be possible to run code directly on the NATLab after a push or a pull request, but I'm unsure if this will work with TU Delft firewall (I can't connect to the NATLab Proxmox without tunneling with SSH first, but perhaps the firewall will allow github webhooks to run?) I'll look into this once I have Jenkins running on the NATLab. I have also discussed with Stephen via mail about the IPv4 block in use for the routers / VMs. The IPs that the routers are currently manually assigned are still available, so I should be able to turn the routers on without IP issues. We also discussed some improvements: Right now the internet facing switch is set to unmanaged, without DHCP for the routers / VMs. However, I'm going to collect the MAC addresses, and should be able to assign MAC addresses to VMs on creation (if it works properly with Ansible, TBD), so it should be possible to switch everything to DCHP. We also discussed DNS names for the routers / nodes, this isn't needed for now, but would be a nice addition as well. Now, for this sprint, my main goal will be: Getting the NATLab up and running again. Using the latest Proxmox software, and with the ability to power-cycle routers, so I can start using the routers with my ipv8 code. As a side note on this: Proxmox 7.3 was released at the end of November, and I did my test setup with 7.2, so I'll be retesting my Ansible code with 7.3 this sprint. I figure it all will work, but I'll have to check that to confirm. The master project current sprint has been updated. |
How is the progress going? Idea in collaboration with #7074: Every smartphone bootstraps in IPv8 and finds peers. By connecting to peers with public IPv4 addresses its possible to communicate. Peers behind carrier-grade NAT are hard to connect to. Any public connectable peer can be asked to relay a SIM cards are coming, so we have hopefully a match for birthday paradox testing. @rahimklaber is working on FROST+reliable networking. Can the Python of NATLab communicate with Kotlin of Android devices within an IPv8 NAT-Puncture community? |
Next week I should be able to test the NAT boxes for NAT types. I have updated my planning on the master project page. Like said just before Xmas when I was on the TU Delft: I needed a few weeks in January to sort some other stuff (done, shown as break), afterwards I have an open calendar again. I was planning on starting up again 2 weeks ago, but unfortunately I got sick (shown as a sprint, and done). This week my brain started working again, and I'm working on getting the Proxmox server running, so I can connect to the NAT boxes again. Also setting up the VMs and Jenkins so next sprint I can run code. With the main plan for following sprint to test the boxes with ipv8. About Kotlin / Android: I can't answer this, as I have no experience with Kotlin / Android and/or running ipv8 on Android. |
|
@remkonaber |
OK. The past 2 weeks I took stock of where I was: Before my injury, I was working on getting Jenkins working on my staging Proxmox. However, one of the issues I kept having was constant SSH connection failures while running Ansible to set it up. I tried using ControlMasters, but that also didn't help. And even with 1 connection max and pauses between calls I'm getting errors. I think it has to do with triple-nested VMs, combined with too slow hardware. As discussed with Johan over the phone, I'm dropping this for now. In the future I might try to set up Jenkins on VM on the NATLab server instead, to see if I don't get the SSH errors there (only double nested VM then, and better hardware). Because the idea is great; having a Jenkins instance running on Proxmox, then using Ansible and the Jenkins Swarm plugin to set up all the VMs automatically and having them report back to Jenkins to be available as agent. Having the Jenkins on the same private internal network of all the VMs would make using Jenkins much easier then using my local Jenkins as master, where I have to tunnel through 3x SSH connections to even reach the VMs. The coming weeks I'll focus on creating the Ansible part to set up the NATLab server. Then all the VMs. I'll check the TP-Link switch, making sure all the VLANs are still set correctly. Then see if I can connect to each of the NAT boxes GUIs using tunneling through the TU Delft firewall, and port redirection. While I'm doing this, I'll make accounts for Orestis and add his SSH key so he has access to everything. R. |
Progress meeting.... server still operational 😮 💎 😮 Another student is working on the 5G side with Android and bought a few SIM cards Brainstorm for experimental results, use few thousand daily users of Tribler??? They use standard IPv8. Modification of Tribler code should be avoided. Test how many peers are connectable or something more exciting? Another idea, compare standard IPv8 technique with "Remko improvements". Big milestone 0: thesis chapter written |
This is the placeholder for the Ansible, Promox, LXC container based infrastructure. We have a 48-port switch, currently 19 NAT operational.
Research idea by Remko: open 1000 listen ports on two sides, brute force a connection across a symetrical NAT box.
All rough work-in-progress script will appear here: https://github.com/remkonaber
Builds upon the work from #2131 and is required for: #2623
Test code to run using this infrastructure: https://github.com/YourDaddyIsHere/walker_with_clean_design
2013 NAT results:
The text was updated successfully, but these errors were encountered: