Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failing NAT simulation tests #478

Closed
3 tasks done
tegefaulkes opened this issue Oct 10, 2022 · 5 comments
Closed
3 tasks done

Fix failing NAT simulation tests #478

tegefaulkes opened this issue Oct 10, 2022 · 5 comments
Assignees
Labels
development Standard development r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Oct 10, 2022

Specification

Recently the NAT tests have been failing. It seems to be from the network domain changes but i'm unsure when exactly the problem started. So far I've looked into the problem and the failure seems to be that we're failing to connect to the target agent using a nodePing. The errors in the code seems to be consistent with the expected operation where we can't connect to a target. But the tests expect us to be able to connect.

Starting from the DMZ tests, we have all the network name spacing setup but no IP table rules mimicking a NAT. the expected outcome is that we can connect to the agent by trying to connect through the virtual router in the namespace. With this setup there should be nothing impeding the connection but we are still failing to connect.

Given the compexity of the tests, there are quite a few places this could be failing at. It could be the networking or nodes connection code causing the problem. At this stage that seems unlikely since this code work in all other tests but the NAT tests. I could be the network namespace setup causing the problem. If the networks doesn't work as expected here we could be failing to connect.

Additional context

Tasks

  • 1. Check that the networking and nodes domain code are working as expected.
  • 2. Check that the network name spacing setup is working as expected.
  • 3. Try to find the root cause of the problem.
@tegefaulkes tegefaulkes added the development Standard development label Oct 10, 2022
@tegefaulkes tegefaulkes self-assigned this Oct 10, 2022
@tegefaulkes
Copy link
Contributor Author

I've reviewed the network and nodes code relating to this. So far as I can tell it's working as expected if you were unable to connect to the target. So at this point I'm leaning towards it being an issue with how the NAT tests are setting up the network and name spacing.

Next steps to try would be

  1. Walk down the staging commits and isolate the commit that caused the problem. There was mention of the nix packages getting updated. Could that be the cause?
  2. Check that the name spacing setup in the NAT tests are working as expected. We can add a sanity test where we do a netcat in place of the agent communication. That should absolve or implicate the agents depending on the result.

@tegefaulkes
Copy link
Contributor Author

this should take at most a day to complete. could be sooner or longer depending what we run into. Network issues like these have been head scratchers in the past so we'll see.

@CMCDragonkai
Copy link
Member

We'll put this on the backburner while we bring up the testnet integration work for this week. The reason for this failing may be revealed through our manual testing of the testnet.

@CMCDragonkai CMCDragonkai changed the title Fix failing NAT tests Fix failing NAT simulation tests Oct 31, 2022
@CMCDragonkai
Copy link
Member

We may want to move these NAT simulation tests into https://gitlab.com/MatrixAI/Engineering/Polykey/Polykey-Simulation if they involve too much infrastructure.

I think they do use quite a bit atm.

@CMCDragonkai
Copy link
Member

According to @tegefaulkes these are now passing in #474. So I'll close this. Reopen if this still occurs, maybe the NAT infrastructure isn't to blame here.

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

No branches or pull requests

2 participants