Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAT puncturing infrastructure #2754

Open
synctext opened this issue Jan 25, 2017 · 40 comments
Open

NAT puncturing infrastructure #2754

synctext opened this issue Jan 25, 2017 · 40 comments
Assignees

Comments

@synctext
Copy link
Member

synctext commented Jan 25, 2017

This is the placeholder for the Ansible, Promox, LXC container based infrastructure. We have a 48-port switch, currently 19 NAT operational.
Research idea by Remko: open 1000 listen ports on two sides, brute force a connection across a symetrical NAT box.

All rough work-in-progress script will appear here: https://github.com/remkonaber

Builds upon the work from #2131 and is required for: #2623

Test code to run using this infrastructure: https://github.com/YourDaddyIsHere/walker_with_clean_design

2013 NAT results:
nat

@remkonaber
Copy link

remkonaber commented Jan 25, 2017

Master Thesis work on the NATLab can be found in the following repositories:

This comment will be updated to contain quick links to the various parts of my master project. These same links can be found on the metaproject repository README.

@YourDaddyIsHere
Copy link

YourDaddyIsHere commented Jan 25, 2017

I am the guy in #2623.
The codes in https://github.com/YourDaddyIsHere/walker_with_clean_design is quite...simple. It is actually the current walker using in the current dispersy and tribler. I didn't change the logic and strategy of the walker.
some functions relevant to walker has been paste in the TestCommunity, which inherits the MultiChainCommunity and the mid is hard coded (copy from MultiChainCommunity) so it is walking in MultiChainCommunity, but for experiment used, it is better to create a new community

I will give you a independent walker, no dependency on dispersy and tribler later.

My email address is [email protected], send me an email if you want

@remkonaber
Copy link

@synctext Pushed my current Ansible code to Develop branch. This code runs for me on my staging server. The basic structure is there, and I'm able to create multiple VMs with it if I specify them in the host_var file for the server.

I now have a thorough understanding of how Ansible works, and will be extending this code the coming weeks. The next goal is to auto provision the nodes with users and ssh keys. I found out it is possible from ansible to take someone's public key from Github and put it directly into auth_keys. We have to discuss if we want to forward the SSH ports of the VMs to the outside, or if we want to only allow tunneling through our main server. Both methods have pros and cons.

About the NATLab server: I have updated the server to version 4.4, it looks nice. I was able to powercycle the NATBoxes, and I tried to connected to 4 of them by manually setting up tunnels. Of the 4 boxes, I was able to connect to 2 of them, 1 was dead, and 1 didn't let me log in. Tomorrow I'm going to check the rest of the boxes, and see if I can reset the boxes that have issues.

@synctext
Copy link
Member Author

synctext commented Feb 15, 2017

the fun stuff: https://github.com/remkonaber/natlab-ansible/blob/develop/roles/natnode_create/tasks/main.yml

---

- name: Create LXC node
  proxmox:
    vmid: "{{ item }}"
    hostname: "node{{ item }}"
    password: "{{ lxc_password }}"
    netif: '{"net0":"name=eth0,ip={{ internal_network }}.{{ item }}/{{ internal_netmask }},bridge=vmbr1", 
             "net1":"name=eth1,ip={{ vlan_network }}.{{ item }}/{{ vlan_netmask }},bridge=vmbr1,tag={{ item  }}"
             }'
    ostemplate: "{{ lxc_template }}"
    node: "{{ proxmox_node }}" 
    api_host: "{{ proxmox_host }}"
    api_user: "{{ proxmox_user }}"
    api_password: "{{ lookup('env','PROXMOX_PASSWORD') }}"
  with_items: "{{ natnodes }}"

Tasks for coming weeks:

  • complete setup boxes
  • experiment master container
  • Jenkins integration
  • run simple hello_world.py on all nodes
  • ignore Gumby for now or copy DAS5 approach and add NatLab.ewi.tudelft support ?

The fun part in future is still: open 1000 listen ports on two sides, brute force a connection across a symetrical NAT box. So get hardware+firmware setting to put in this nasty mode.

  • new boxes are 15-30 minutes to setup.
  • thesis science idea: medium latency node, high-latency node, highly-variable latency node, lossy node + combine (tc - show / manipulate traffic control settings)

@remkonaber
Copy link

@synctext The branch develop is updated with current code.

The NATnodes are completely setup (see screenshot below), the Ansible code to create the nodes, and remove them is working good. Initial provisioning is working as well, more will be added as needed:

natlab_proxmox

I read up on Jenkins, created my own Jenkins install + Jenkins slave on my staging server. I created a simple "Hello World" pipeline, and let it run on the slave:

natlab_jenkins

I also spent time on Gumby. Created my own test experiment, and tested the dummy experiment (Hello World style). Figured out how to run it locally, how to run it on 1 remote. Documentation for Gumby is severely lacking, so this took me quite some time. However, it still seems the best choice.

The integration of the Jenkins + Slave >> Gumby + NATNodes isn't done yet. This consists of 2 parts that still have to come together:

  1. I'm working to turn my manual hacked Gumby setup into Ansible code for the "Gumby + NATNodes" part. I need to add code to provision the nodes to have a user that can run Gumby with virtual_env python environment. I need to figure out what to setup with Ansbile, and which parts will have to be done with Gumby. What software to install on each node, etc.
  2. The "Jenkins + Slave" part is different. I have to setup pipelines to pull Gumby + own code, puts it together then let Gumby run the experiment, then collect the result. I'll probably opt to quickly setup a Jenkins on the NATLab server for testing purposes. Then when the pipelines run, I can add them to Tribler-Jenkins, and run it from there. I studied the available pipelines in the Tribler-Jenkins, and I'm confident I can get this working, but I first need to get Part 1 done before I can test it.

@synctext
Copy link
Member Author

synctext commented Mar 22, 2017

Gumby has multiple semi-identical Bash scripts for each experiment which configure servers. Ansible is specifically designed for getting a server ready for Dispersy or Tribler. Thus Gumby local_setup_cmd = 'das4_setup.sh' and above Ansible code have overlap.

Each LXC container controlling 1 NAT box and having 1 IPv4 is a clean Ubuntu 16.04. Thus needs Python + all other stuff installed. This thesis uses a wipe-upon-completion security policy.

Thesis material: provisioning pipeline, password-less remote login, jenkins remote login, remote access to web interface of NAT boxes, ssh tunneling, etc.

@remkonaber
Copy link

remkonaber commented Apr 3, 2017

Overview of the NAT routers currently setup on the NATLab, with working notes, as requested:
natlab old routers setup

More NAT routers will be ordered and installed soon(™).

@remkonaber
Copy link

Coming to install more routers tomorrow, 7 out of 31 have arrived. Hopefully this number will increase, but there seems to be a slight issue in the supply line. I will prepare the 4 new shelves, and the 7 routers that have arrived. The rest will be installed at a later date.

I believe I was able to get part of the dummy/remote experiment of Gumby working using 3 nodes of the NATLab. Gumby was able to copy the workspaces to the other nodes, then starting to run the dummy sleep commands on each node. The commands ran, but then it hang for unknown reasons. At first I thought it was waiting on the tracker_cmd, but I tried removing it and I get the same hangup. I think have narrowed it down to an error that has to do with the locale settings in the LXC container while setting up a virtualenv, for some reason locale isn't set correctly, looking now for a way to fix that when I spin up the VM in the first place.

I now need a real project to test Gumby with, to understand Gumby better. I had a look to in Tribler Jenkins to see if I could find a project I could run, but I just couldn't find anything simple enough to test with. I figure I'll create my own Hello World project that outputs something simple to understand learn Gumby step by step. This could then also serve as an example of how to run an experiment on the NATLab servers.

Later this week I'll setup a Jenkins as a VM on the NATLab. That will give me a playground to test out Jenkins, and to see if I can get Gumby running from Jenkins aswell. Its basically the manual step I took to install gumby now with git cloning, and then starting the dummy/remote experiment, but then in Jenkins pipeline form. But I first need to a successful run with a Gumby run that actually completes.

@synctext
Copy link
Member Author

The prior work from Cornell from 2005: We used the client to test a diverse set of sixteen NATs in the lab

@synctext
Copy link
Member Author

Total amount of NAT boxes today: 22 running + 2 others + 8 new = 32 total

New racks are installed at our building. PC with 64GByte+16-cores for LXC containers has no Internet yet.
@devos50 please share your Ansible scripts with @remkonaber. Student developed Ansible script to install a Jenkins server from scratch (obviously I accused him of overengineering..)...

Next target, copied from above: run simple hello_world.py on all nodes

@devos50
Copy link
Contributor

devos50 commented Mar 24, 2018

For setting up a Jenkins server, I use this role, no need to re-invent the wheel :)

@remkonaber could you please tell me which Ansible script you particularly need or are interested in? I'm a bit hesitant with giving you access to all scripts since they contain sensitive information like passwords and filtering these out takes some time due to the large amount of scripts we have.

@remkonaber
Copy link

@devos50 Thanks, I'll have a look at that role. My main reason to set up Jenkins myself is just to learn about it. Having examples in galaxy is handy, but there many, so it is handy to know what you prefer yourself.

About your other Ansible scripts, I don't need access to those scripts, this was just a thought of Johan. Ansible is pretty straight forward once you get the hang of it, and modules are documented nicely. There are also plenty of examples to be found online and/or in galaxy. Thanks anyways.

Now for some photos of moving the NATlab. First we had to pack everything into a car:

img_0839

Here we have the NATlab skeleton set up, server+kvm+switches and the shelves installed:
img_0845
Once there is an Internet connection, I'll come and install all the NAT boxes.

@egbertbouman
Copy link
Member

@remkonaber I'd like to run my DHT with integrated NAT puncturing on your setup. How can I do this?

@remkonaber
Copy link

remkonaber commented Jun 15, 2018

@egbertbouman Right now, this isn't possible. Unfortunately it has taken many months before there was a working internet connection available for the NATlab server. Two days ago I finally got the word that there is an outlet available. So now the next step is to finish the NATlab setup, setting up all routers, the vlan-switch, and making sure they are all set correctly. I hope to get this done before the end of June.

On the software front, it is setting up the NATlab as a part of Jenkins. Then it becomes possible to make a pipeline that uses Gumby on a control node to run software over every other node, let Gumby collect the data, and then Jenkins collect that again. I'll going to make a pipeline to test the setup, that should serve as example of how to run code.

I'll update progress here, and will let you know when the NATlab is set up.

@egbertbouman
Copy link
Member

@remkonaber OK, sounds good. Thanks!

@remkonaber
Copy link

@egbertbouman Just an update. It seems the network connection still wasn't available. Stephen is trying to figure out what has gone wrong now. In the mean while, I have setup power supplies, Vlan switch, 20 routers, done initial router configuration:

natlab

Stephan couldn't give me an ETA on when the network might be up, so I'm waiting to hear from him again.

@egbertbouman
Copy link
Member

egbertbouman commented Jul 13, 2018 via email

@synctext
Copy link
Member Author

synctext commented Feb 7, 2022

This master thesis work is expected to resume 1 March 2022. 🎉 🎉 🎉
After numerous years, its still relevant and unsolved. ToDo:

  • upgrade OS from experimental NAT-lab server
  • buy recent NAT/wifi boxes
  • investigate IPv8 network and read IPV8 tutorials
  • join IPv8 network from the NAT boxes and measure NAT puncture statistics
  • create an IPv8 deployment test
  • integrate with pull request tester on Github
  • re-produce the picture of the 2013 NAT results (displayed at top of this page).

@synctext
Copy link
Member Author

another progress update phone call: upgrading code + swapping in prior efforts. run any .py on boxes. new todo: ipv8 PR testing

@synctext
Copy link
Member Author

synctext commented Sep 6, 2022

Related work:
Stunner: A smart phone trace for developing decentralized edge systems
To achieve this for the domain of distributed smartphone applications, for many years we have been collecting data via smartphones concerning NAT type, the availability of WiFi and cellular networks, the battery level, and many more attributes. Recently, we enhanced our data collecting Android app Stunner by taking actual P2P measurements. Here, we outline our data collection method and the technical details, including some challenges we faced with data cleansing. We present a preliminary set of statistics based on the data for illustration.

https://tailscale.com/blog/how-nat-traversal-works/
We can do much better than that, with the help of the birthday paradox. Rather than open 1 port on the hard side and have the easy side try 65,535 possibilities, let’s open, say, 256 ports on the hard side (by having 256 sockets sending to the easy side’s ip:port), and have the easy side probe target ports at random.

Number of random probes Chance of success
174 50%
256 64%
1024 98%
2048 99.9%

If we stick with a fairly modest probing rate of 100 ports/sec, half the time we’ll get through in under 2 seconds. And even if we get unlucky, 20 seconds in we’re virtually guaranteed to have found a way in, after probing less than 4% of the total search space.

@synctext
Copy link
Member Author

As PC is replaced by Smartphone - we also see a shift of NAT/UDP puncturing and open point knowledge towards mobiles:
https://flyer.sis.smu.edu.sg/ndss19.pdf

btw new plan here: https://github.com/users/remkonaber/projects/6

@remkonaber
Copy link

A new plan indeed. With sprints of 2 weeks at a time. Those sprints have been tentatively filled in on this github project:

https://github.com/users/remkonaber/projects/6

I spent the past 2 weeks (sprint) understanding the py-ipv8 code, setting up LXCs/Qemu/Proxmox locally, and setting up github stuff. I also had a chance to look at Gumby, and was pleasantly surprised it now works with py-ipv8 communities. It even works isolated with isolate_ipv8_overlay, I'm looking forward to working with this.

I have also created a sub-project for the Infrastructure:

https://github.com/users/remkonaber/projects/8

And a sub-project for the Code that will run on the lab:

https://github.com/users/remkonaber/projects/9

These are basically to-do lists I can link to issues in the various github repositories that I'll be creating for the code deliverables.

The current sprint (2 weeks again), is about building my staging lab with Ansible and running some code, preferably py-ipv8 based. I'm opting to set up templates for lxc/qemu, so I can clone VMs for fast rebuilding of the lab to get fresh states each time.

@remkonaber
Copy link

Extending the current sprint by a week. I'm simply not at the point where I can run ipv8 code on the nodes quickly. Getting everything to run on proxmox is just taking longer then I anticipated, the actually plumbing work that is, it just adds up in time.

I haven't had any stumbling blocks, except for adjusting a base template scripted (With Packer Proxmox builder for example.) I'll skip that for now, and just use a recent (debian) image, adjust it by hand, and use that as a template to clone from. That should be enough to get a small test community running on the nodes. I will revisit templates later when/if needed. For now I'll focus on just getting a full stack running so I can start to work on ipv8.

@remkonaber
Copy link

I was going to commit the v0.1.0 of my ansible code to github last week, but I just couldn't focus on it (fatigue from boostershot most likely). Today I sat down to clean up and commit the ansible code. Here it is:

natlab-ansible-infrastructure

Had a few issues, which mostly boils down to Ansible Proxmox modules not implementing everything (correctly). I added these issues to the natlab-ansible-infrastructure/issues. I mostly worked around them for now.

As for running code on the VMs: I had my local Jenkins run an agent on one of the nodes. This worked fine. However, I found out that while I can easily use a jenkins role to set up Jenkins Server on an LXC in Proxmox, automatically configuring Jenkins and then setting up nodes as slaves is slightly more convoluted.

I did find some nice roles on Ansible Galaxy (from lean_delivery) that only requires me to first set up Jenkins (Ansible), set up plugins on Jenkins (Ansible), configure plugins (JCasC / Groovy), configure agents (Ansible / Groovy). Since this will still take some tinkering, I'll add this to a future sprint.

So for now I'll setup some agents on my local Jenkins manually, because I want to focus on the python ipv8 code for a bit. Updated the current sprint (4) on the master project to reflect this.

@synctext
Copy link
Member Author

synctext commented Oct 21, 2022

@remkonaber
Copy link

Not a broken link. I just forgot to change it to public last night it seems. Should work now.

About master thesis format, I found the Latex template for msc theses on the the MSc Project site of Distributed Systems. However that does look different from the thesis you just linked. Is this other template available somewhere? Or should this page still be updated to include the new template?

@remkonaber
Copy link

I spent the last sprint immersing myself in py-ipv8. Worked through the docs, and the tutorials, then dove into the ipv8 code. Since I was doing this, I figured I would look at gumby at the same time, since gumby can be used to run IPv8 overlays with the help of scenarios.

First Gumby: I thought I was going to use Gumby to distribute code over the lxc nodes and use scenarios to run the experiments. In the past Gumby had the option to distribute code using ssh to nodes. However, this functionality has been removed at some point. Only running local and on DAS(5/6) is possible now. So I'm going to have to do this some other way. Since I'm planning to use Jenkins anyways, I should be ok with just using (parallel) pipelines with Jenkins to run code.

Then IPv8: I began with the idea of adding more UDP ports by extending the "Endpoint" class, and adding a TraversalCommunity, and seeing how much I could do with that. I quickly ran into the question of what to do next, figuring out where state is stored, and how walkers work. So I started my first deep dive into the source of py-ipv8. I looked at the basic 4 messages in each community for peer discovery. I saw some extra bytes were sent with introduction requests, and thought: "Perhaps I could use that". However these bytes aren't sent on in followup puncture request, so this is a dead end. Plus these messages are used by every community and the bootstrap servers, so changing anything here is not a valid option.

In order to do NAT testing/traversal, I'm going to have to extend the ipv8 code to store some more state, and work with multiple ports. The code dive showed me how deeply ingrained the concept of only having 1 UDP port is in the internal working of py-ipv8. Only 1 IP:port address is stored per "Peer", even for the own "Peer" only the last reported wan ip is stored. It also showed me that the walkers work with IP:port addresses instead of the concept of "Peers" (as does bootstrapping), and getting walkable addresses is getting addresses from the "Network" class. I was hoping to just subclass "Peer" and store more state there, but I'll have to extend the "Network" class as well.

Then I looked at it from an overlay point of view. The basic community overlay introduces nodes to each other, this already punctures NAT in the symmetrical case, for the standard UDP port. I'll leave this port alone and do testing / traversal using fresh UDP ports that I open for the tests, this should assure that there is no prior state either in the NAT routers while running our tests, giving cleaner test results. The information that is gathered can be used to figure out the NAT type we are behind (forwarding rules/packet filter etc). Then various traversal techniques can be tried. The community overlay I will use to exchange messages between peers (directly or relayed), so that I can test every router and produce data result to turn into pictures for the various traversal techniques.

Next sprint: My goal is to extend the "Endpoint", so it is possible to open extra UDP ports by the TraversalCommunity, then have peers exchange simple messages using these ports. A secondary goal is setting up a way to relay messages between two peers (in case they can't communicate directly).

As usual, the master project has been updated.

@qstokkink
Copy link
Contributor

qstokkink commented Nov 10, 2022

You may want to look at the DispatcherEndpoint, which allows you to hook in zero-or-more other Endpoints and support multiple ports. Here's an example in Tribler.

The Peer class does have an address, but this is simply shorthand to get the most preferred interface. If you add interfaces yourself, you can access each and every one of them by inspecting addresses. One limitation here is that addresses are remembered by class, so you would need to have a SecondaryEndpoint (or some other custom name) subclass of whatever endpoint you want a second one of. For example, this would be sufficient:

PrimaryUDPv4Address = collections.namedtuple("PrimaryUDPv4Address", ["ip", "port"])
SecondaryUDPv4Address = collections.namedtuple("SecondaryUDPv4Address", ["ip", "port"])

class PrimaryEndpoint(UDPEndpoint):  # IPv4 but you can subclass from UDPv6Endpoint as well
    def notify_listeners(self, packet):
        super().notify_listeners((PrimaryUDPv4Address(*packet[0]), packet[1]))

class SecondaryEndpoint(UDPEndpoint):
    def notify_listeners(self, packet):
        super().notify_listeners((SecondaryUDPv4Address(*packet[0]), packet[1]))

EDIT: In the previous version I omitted the address rebinding.

Using the DispatcherEndpoint to offload to subclasses and explicitly accessing Peer.addresses should save you from extending Peer and Network (and save you a lot of time).

I started creating some documentation on walking. It's not done but it may help you out (you can find it here).

@remkonaber
Copy link

Indeed, I want to save time by using what is already implemented! I'm only going to extend only where needed, and reuse what I can. I figure that I can subclass and only add/override what I need, without touching the working of the classes I subclass.

Some thoughts based on what you wrote / my ideas that I already had:

SecondaryEndpoint: Extending it like this could work, but what if I want to open more then one? What if I want to open 128 or 256 ports? (NAT puncturing using birthday paradox) Is there an easy way to do that? And when sending to a peer, I'll have to know which endpoint I'm trying to reach, and not just pick the preferred address. So my thought was to subclass Peer to get for example "TestEndpoint.some_identifier". Not storing it in self._addresses, but instead creating something like self._testendpoints in a subclass TraversalPeer is what I was thinking about.

DispatcherEndpoint: The UDP / UDPv6 endpoints are stored in a dict using a dictionary comprehension in self.interfaces. The key is expected to be the interface name UDPIPv4 or UDPIPv6, since it is used to instance the corresponding endpoints. Correct me if I'm wrong (because I could be missing something in this dict comprehension), but doesn't that mean only the one "UDPIPv4" endpoint is stored, even if I supply the configuration with more. I could extend this like SecondaryEndpoint, to add more named UDPIPv4 ports to one DispatcherEndpoint. But, again, what if I want to open multiple endpoints, perhaps dynamically, based on what traversal test I want to try?

My thought was something like MultiDispatcherEndpoint or TraversalEndpoint subclass that just wraps the normal DispatcherEndpoint, but adds the ability to open more endpoints.

One thing I'm still not sure about: Will I need to know what endpoint a packet arrived on? It could be that I'm missing something while looking at it, but notify_listeners only passes on source address and the data packet. If you only have one port open, you know exactly where it arrived. If you have multiple ports open and you receive a packet on one of them, you might need extra information. Sending extra information by extending the tuple in the call to notify_listeners from UDPEndpoint to the Community through the notify_listener chain ends in on_packet where it would crash during tuple unpacking, unless I override on_packet in the TraversalCommunity. However, this is a possibility. But this also means I have to subclass UDPEndPoint to actually pass extra information on in the tuple.

Network: Here I don't know yet what extra information I might need. I was thinking about storing some extra information about what type of connection each peer has (Type: Open / Symmetric / ASymmetric, PacketFilter: Port / Address dependant, etc). That way you know if you can connect directly to a peer, or if you need to get the peer to puncture his side or do more. To do this, I could either store dictionaries, or create a class TraversalData, which I then store in the subclass TraversalNetwork of Network.

Walkers: When in doubt, check the code. I looked into the current walkers, and I get the concept behind them. Like it says in the docs: If you have an interval task in your TaskManager that leads to network I/O, you should consider converting it to a DiscoveryStrategy. I'm going to have to select peers to test with. However, I'm going to have to do it based on information like: which peer haven't I tested with -or- Can I test with a asymmetrical peer. I'll have retrieve that extra information from the Network, which leads me to TraversalNetwork again. And this probably means I'll have to make a TraversalWalker to do the actual walk.

Final Thoughts:

  • I'll reuse what I can: Serialization, Overlay communication, RequestCaches, etc. I'm glad I don't have to implement that.
  • I'll extend where I need more functionality. Subclasses feel the cleanest way to do this. But only where I can't get the functionality in another way.

Quinten: I'm open to any more suggestions and/or thoughts that you have. Esp. about opening multiple endpoints, and storing the information about those endpoints.

R.

@qstokkink
Copy link
Contributor

Disclaimer: there is more than one way to implement these things - don't regard the following suggestions as absolute truths. I fully support your overall conclusions and I'm only giving my perspective to try and make your life easier - feel free to ignore.

What if I want to open 128 or 256 ports? (NAT puncturing using birthday paradox) Is there an easy way to do that? And when sending to a peer, I'll have to know which endpoint I'm trying to reach, and not just pick the preferred address.

Personally, if it's just 256 ports, I would use dynamic subclassing (using type) to create the 256 subclasses. The DispatcherEndpoint has a dictionary to couple addresses to known interfaces so with one dictionary update the endpoint mapping should be taken care of.

Subclassing Peer seems like it would lead to a game of whack-a-mole with you fighting IPv8 internals that create non-subclassed Peer instances.

Correct me if I'm wrong (because I could be missing something in this dict comprehension), but doesn't that mean only the one "UDPIPv4" endpoint is stored

Correct. Each subclass needs a unique name. However, if you dynamically generate the subclasses this should not be an issue.

My thought was something like MultiDispatcherEndpoint or TraversalEndpoint subclass that just wraps the normal DispatcherEndpoint, but adds the ability to open more endpoints.

I like this idea - it seems like a good abstraction level to manage ephemeral endpoints.

One thing I'm still not sure about: Will I need to know what endpoint a packet arrived on?

Right now: yes. The IPv8 interals assume that the address class is uniquely tied to some endpoint. This is also why I suggest(ed) using subclasses.

If you only have one port open, you know exactly where it arrived. If you have multiple ports open and you receive a packet on one of them, you might need extra information.

I just wanted to highlight this to explicitly confirm (because it's important). Yes. NAT mapping happens between unique source and destinations. If you send from a different port to the same address you have a high chance of being blocked again.

I was thinking about storing some extra information about what type of connection each peer has (Type: Open / Symmetric / ASymmetric, PacketFilter: Port / Address dependant, etc).

The IntroductionRequest and IntroductionResponse still have bits reserved for the connection type. The past ~9 years peers have only shared "unknown" with each other though :-) Perhaps you could use that (and maybe another bit or two of the unused flag bits in the introduction logic)?

Personally, I'd just edit the Community.on_introduction_response and Community.on_introduction_request to attach this information to Peer instances instead of editing the Network. Of course, your approach would (again) also work.

And this probably means I'll have to make a TraversalWalker to do the actual walk.

That seems like a good approach if you have custom logic, regardless of whatever you choose to do (either a Network subclass or attaching information to Peer instances).

Overall, I feel like you're avoiding editing the Peer class a bit too much. If your work is successful at covering different NAT setups, feel free to open a PR on IPv8 to attach more info to the Peer class.

@remkonaber
Copy link

There are indeed multiple ways to implement things. That's exactly why I said I was open to suggestions. Always nice to hear other perspectives/ideas. So, thanks.

And you are correct. I was trying to avoid editing classes in py-ipv8. I started looking at my code with the assumption that I would use the library, not change it (or only make small changes). Which means subclass if you need more functionality. I was going to try to monkeypatch Peer if just subclassing wasn't enough, because I did notice that Peer was instanced in different places. However, based on your feedback, here is my plan:

  • Extend Peer class. Add NAT type, add port filter type, etc. Exact additions and duration of validity I'll figure during testing of the routers. Peer class seems the correct place to put information that is part of a peer characteristic.

  • Extend DispatcherEndpoint: Using type for dynamic classes. I would have probably used a dict of UDPEndpoints keyed to "port number", but using dynamic classes seems an interesting method. So I'm going to use that.

  • Extend the UDPEndpoint somehow, so the community message handlers can figure out on what port a packet arrived: My first thought was that I can pass on a reference to itself in the tuple given to notify_listeners. Then I will have to adjust on_packet in Community for that so it doesn't crash on tuple unpacking. And then passing it on to the handler somehow. But, right now handlers aren't capable of receiving extra info beyond source_address and data. And those handlers have decorators that unpack payloads from data, so if I want to pass on more to the handlers, I will also have to change the decorators and then the handlers themselves.
    All of this is possible, but it is starting to look like a chain of changes in a fundamental part of py-ipv8 to make this work. I'll have think on this more to see if there is a better way. Like: Instead of passing raw data up the chain, create an UDPPacket class, and pass that on. Then you can add any information you want to the received packet: source address, receiving port, time of arrival etc. You will still have to change everything, but at least you are only passing one object around which is easier to extend if you want to extend it in the future again.
    (Any thoughts/suggestions?)

  • Extend Network class. Either by subclassing or adding stuff. This will be used to query for certain types of peers by a walker. What I need here I'll figure out while implementing.

  • Create a community and add messages.

  • Implement various NAT testing methods: With the help of other peers, figure out own NAT/Firewall type.

  • Implement various NAT traversal methods: After figuring out both NAT/Firewall types of two peers, attempt to use a strategy to connect them.

  • Create Walker(s): The logic that will actually test the peers together, by finding a peer to test with, and communicating with it to test NAT. This might be 2 walkers, one for detection of NAT type (to figure out what type the peer has), and another walker to for traversal testing to test traversal methods.

As for the current introduction messages: I did notice the reserved bits, but also saw they weren't actually used. I have thought about this for a while: In an ideal situation, each peer would know what type of NAT/Firewall they are behind, then when asking for an introduction from a peer B, peer A could sent this information with the request. This could be forwarded to the new peer C in the puncture request, and peer C NAT/Firewall type could be sent back to A in the introduction response. That way both peers A and C would be able to set a up a strategy to connect to each other (perhaps with relayed help from B). And this since this sounds good, I'm going to test this out in my TraversalCommunity.

But this is a full plan. For next sprint the goal will remain: Implement the multiple UDP ports, send some messages using those ports. And also try to relay a message between two peers using a helper peer.

@remkonaber
Copy link

Ok. I got done what I wanted to do last sprint. And I did some more, since I was looking for a way to send some test messages, and figured I might aswell make a rudimentary Walker with a take_step out of it.

For starters, I extended DispatcherEndpoint with an extra port. Creating new classes for UDPEndpoints, one static, and one I derived dynamically from that using type. Then I started using these ports to send some messages (Ping Pong). I traced the route of the UDP packet through the stack to figure out where I could have issues, and how to pass on the port it arrived on. That's when I started to fully understand your subclassing addresses suggestion: Except when using lazy_wrapper, the socket_address namedtuple is passed on to the handler for the message, providing a way to know what port it arrived on (at first I had only looked at lazy_wrapper and that passes on a peer, and not socket_address.). So that indeed solves that problem. And if I need signed messages, I can add a lazy_wrapper_wp (with port) decorator that does pass on the socket_address.

Adding the extra UDPEndpoints to interfaces works, but only if I add to the mapping dispatcher.endpoint.INTERFACES as well, or it would crash in my_preferred_address. Though it feels weird to me to add them to a module variable, it makes the code run. I haven't stored extra addresses in peers yet, but I'll use the address class namedtuples there as well. This is for another sprint.

Then I played around with payloads in its various forms, and added a RelayPayload, that had a PingPayload nested in it. I used this to relay a ping pong between two peers with the help of a 3rd. This was just proof of concept for myself to see how I could relay a payload, I'll make this more generic later on. It also gave me a chance to experiment with caches and understand those better.

The only thing I haven't looked at, but what I'm also thinking about is: creating a simple DispersyBootstrapper that has the IPs of a few bootstrapper peers (part of my community) that won't be behind NAT boxes. That way I don't need to use the TUDelft bootstrapper servers, and I can change the way peers introduce themselves. I'll leave this for a future sprint as well.

Next sprint (not current), I'm going to clean up the code that I wrote. And start adding the logic behind traversal testing. The current sprint my main focus is on running code with Jenkins, goal is to get my community running on 4 lxc nodes in my staging lab using a pipeline. As usual, the master project has been updated.

@synctext
Copy link
Member Author

Great to see this progress 💪👌

"overcoming hostile NAT-based networks using the birthday paradox", could be central thesis message.

@remkonaber
Copy link

Previous sprint went so-so. I had some issues with my new prescription lenses I started using last week. Focusing on computer screen text simply wasn't working, so new lenses were ordered, but they didn't arrive till yesterday. However, using an old pair of glasses and big fonts I did manage to get some stuff done:

Running Jenkins pipelines with pulling from GitHub is straightforward, as I figured it would be. And with web hooks it should be possible to run code directly on the NATLab after a push or a pull request, but I'm unsure if this will work with TU Delft firewall (I can't connect to the NATLab Proxmox without tunneling with SSH first, but perhaps the firewall will allow github webhooks to run?) I'll look into this once I have Jenkins running on the NATLab.

I have also discussed with Stephen via mail about the IPv4 block in use for the routers / VMs. The IPs that the routers are currently manually assigned are still available, so I should be able to turn the routers on without IP issues. We also discussed some improvements: Right now the internet facing switch is set to unmanaged, without DHCP for the routers / VMs. However, I'm going to collect the MAC addresses, and should be able to assign MAC addresses to VMs on creation (if it works properly with Ansible, TBD), so it should be possible to switch everything to DCHP. We also discussed DNS names for the routers / nodes, this isn't needed for now, but would be a nice addition as well.

Now, for this sprint, my main goal will be: Getting the NATLab up and running again. Using the latest Proxmox software, and with the ability to power-cycle routers, so I can start using the routers with my ipv8 code.

As a side note on this: Proxmox 7.3 was released at the end of November, and I did my test setup with 7.2, so I'll be retesting my Ansible code with 7.3 this sprint. I figure it all will work, but I'll have to check that to confirm.

The master project current sprint has been updated.

@synctext
Copy link
Member Author

synctext commented Feb 13, 2023

How is the progress going? Idea in collaboration with #7074: Every smartphone bootstraps in IPv8 and finds peers. By connecting to peers with public IPv4 addresses its possible to communicate. Peers behind carrier-grade NAT are hard to connect to. Any public connectable peer can be asked to relay a Puncture-request-birthday-paradox. You respond to any incoming request by opening numerous sockets per second (ignoring security for now).

SIM cards are coming, so we have hopefully a match for birthday paradox testing. @rahimklaber is working on FROST+reliable networking. Can the Python of NATLab communicate with Kotlin of Android devices within an IPv8 NAT-Puncture community?

@remkonaber
Copy link

Next week I should be able to test the NAT boxes for NAT types. I have updated my planning on the master project page. Like said just before Xmas when I was on the TU Delft: I needed a few weeks in January to sort some other stuff (done, shown as break), afterwards I have an open calendar again. I was planning on starting up again 2 weeks ago, but unfortunately I got sick (shown as a sprint, and done).

This week my brain started working again, and I'm working on getting the Proxmox server running, so I can connect to the NAT boxes again. Also setting up the VMs and Jenkins so next sprint I can run code. With the main plan for following sprint to test the boxes with ipv8.

About Kotlin / Android: I can't answer this, as I have no experience with Kotlin / Android and/or running ipv8 on Android.

@synctext
Copy link
Member Author

create logic for NAT type detection 🚀 https://github.com/users/remkonaber/projects/6/views/1
👍

@synctext
Copy link
Member Author

synctext commented Jul 4, 2023

@remkonaber
Student @OrestisKan has started on Android side of this project on 1st July 2023. His task is to re-create the 2013 results on an Android device, using various SIM cards and hard-coded tests to the NATbox infrastructure at TUDelft.

@remkonaber
Copy link

OK. The past 2 weeks I took stock of where I was:

Before my injury, I was working on getting Jenkins working on my staging Proxmox. However, one of the issues I kept having was constant SSH connection failures while running Ansible to set it up. I tried using ControlMasters, but that also didn't help. And even with 1 connection max and pauses between calls I'm getting errors. I think it has to do with triple-nested VMs, combined with too slow hardware. As discussed with Johan over the phone, I'm dropping this for now.

In the future I might try to set up Jenkins on VM on the NATLab server instead, to see if I don't get the SSH errors there (only double nested VM then, and better hardware). Because the idea is great; having a Jenkins instance running on Proxmox, then using Ansible and the Jenkins Swarm plugin to set up all the VMs automatically and having them report back to Jenkins to be available as agent. Having the Jenkins on the same private internal network of all the VMs would make using Jenkins much easier then using my local Jenkins as master, where I have to tunnel through 3x SSH connections to even reach the VMs.

The coming weeks I'll focus on creating the Ansible part to set up the NATLab server. Then all the VMs. I'll check the TP-Link switch, making sure all the VLANs are still set correctly. Then see if I can connect to each of the NAT boxes GUIs using tunneling through the TU Delft firewall, and port redirection. While I'm doing this, I'll make accounts for Orestis and add his SSH key so he has access to everything.

R.

@synctext
Copy link
Member Author

synctext commented Jul 5, 2024

Progress meeting....

server still operational 😮 💎 😮
host natlab.ewi.tudelft.nl Able to login and power switch of NAT boxes (24 devices OK, 1 fail).
However, server is now behind TUDelft firewall! Because security 🧐

Another student is working on the 5G side with Android and bought a few SIM cards

Brainstorm for experimental results, use few thousand daily users of Tribler??? They use standard IPv8. Modification of Tribler code should be avoided. Test how many peers are connectable or something more exciting? Another idea, compare standard IPv8 technique with "Remko improvements".

Big milestone 0: thesis chapter written
Big milestone 1: define experimental goal for thesis experiment.
Big milestone 2: Repeat the 2013 NAT results, but with 24 boxes:
nat

@qstokkink qstokkink removed this from the Backlog milestone Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

6 participants