Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Connection Issues in Polykey-CLI & PK Agent Termination Issues #183

Closed
CryptoTotalWar opened this issue May 22, 2024 · 6 comments
Closed
Assignees
Labels
bug Something isn't working r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@CryptoTotalWar
Copy link

CryptoTotalWar commented May 22, 2024

Describe the bug

Node Mainnet Connection Issues

During a demonstration with @sujaycarkey, attempts to share vaults between nodes failed due to connectivity issues, even though both nodes appeared active on the network (as shown in the terminal outputs). However, upon further inspection, both user's nodes were failing to appear in the output from polykey nodes connections list and in the polykey nodes ping command output.

We were able to complete identities discover correctly and trust each other's nodes (as shown in the output) but were unable to share a vault, consistently receiving an ErrorRPCTimedOut: RPC has timed out output from both ends. This would make sense if both nodes fail to connect to the mainnet (but the output from Polykey agent status did not indicate as such).

Another consideration to make is that while it is possible that one of our NAT types was Symmetric (i was connected to house wifi and @sujaycarkey was connected to a library uni wifi)m I find it highly unlikely that both of us had symmetric NAT types.

Minor Termination-Related Issues that are Glitchy

Additionally, commands intended to stop the Polykey agent did not terminate the process as expected, requiring manual intervention. This issue seems to persist with two terminals being open (one in the foreground). It is unclear which terminal first ran polykey agent stop, but the process was stuck in "Polykey agent stopping" and could only be shut down with manual intervention, forcing the process to stop via the system activity monitor of my Mac. I've already experienced this issue a few times.

To Reproduce

  1. ...
  2. ...
  3. ...

Expected behavior

  • Nodes should show up in the connections list

  • PK agent should be terminated seamlessly and not be "stuck" Why does it get stuck?

Screenshots

Platform (please complete the following information)

Pablo

  • Device: [Mac]
  • OS: [Mac OS ]
  • Version [see output]

Sujay

  • Device: [Linux]
  • OS: [ Mint ]
  • Version [see output]

Additional context

### Pablo's demo terminal
rulerpablo@Pablos-MacBook-Air ~ % polykey agent status                 12:33:27
status           	LIVE
pid              	71881
nodeId           	vgijtpv0h8m1eajeir77g73muq88n5kj0413t6fjdqsv9kt8dq4pg
clientHost       	::1
clientPort       	55681
agentHost        	::
agentPort        	51457
upTime           	50
startTime        	1716318101
connectionsActive	2
nodesTotal       	7
version          	1.2.3-alpha.4-1-1
sourceVersion    	1.2.3-alpha.4
stateVersion     	1
networkVersion   	1
versionMetadata  	
  version          	0.3.1
  commitHash       	"09b655f012d17de5302d01fbc4c3e835ddf5dc81\n"
  libVersion       	1.2.3-alpha.4-1-1
  libSourceVersion 	1.2.3-alpha.4
  libStateVersion  	1
  libNetworkVersion	1
rulerpablo@Pablos-MacBook-Air ~ % polykey vaults list                  13:02:31
my-software-project	zD8XRJw2SoRoUW5e2mBR9tJ
myvault            	zD3cWJLBDEMWcbwNbjuUevo
myvault-1          	zErezdpLocYs1VRZPV3wcqS
rulerpablo@Pablos-MacBook-Air ~ % polykey secrets list my-software-project
AWS_ACCESS_KEY_ID
AWS_DEFAULT_REGION
AWS_SECRET_ACCESS_KEY
rulerpablo@Pablos-MacBook-Air ~ % polykey secrets get my-software-project:/AWS_ACCESS_KEY_ID
AKIAIOSFODNN7EXAMPLE%                                                           rulerpablo@Pablos-MacBook-Air ~ % polykey identities authenticate github.com
Navigate to the URL in order to authenticate
Use any additional additional properties to complete authentication
url     	https://github.com/login/device
userCode	BEE3-DFC9
Authenticated digital identity provider github.com
identityId	CryptoTotalWar
rulerpablo@Pablos-MacBook-Air ~ % polykey identities claim github.com:CryptoTotalWar
claimId	1565088a929cdb0c15cf647ae38dbf60
url    	https://gist.github.com/CryptoTotalWar/1565088a929cdb0c15cf647ae38dbf60
rulerpablo@Pablos-MacBook-Air ~ % polykey identities discover github.com:sujaycarkey
rulerpablo@Pablos-MacBook-Air ~ % polykey identities list              13:17:11
gestalt	
  actionsList	
  identities 	
    github.com:CryptoTotalWar	
  nodeIds    	
    vgijtpv0h8m1eajeir77g73muq88n5kj0413t6fjdqsv9kt8dq4pg	
gestalt	
  actionsList	scan,notify
  identities 	
    github.com:asuarezop	
  nodeIds    	
    v6o42hql07qr9bh25nosjrr46jloipd0ipbb42j0j98ktfndft3p0	
gestalt	
  actionsList	
  identities 	
    github.com:sujaycarkey	
  nodeIds    	
    v4c11qv5fpq2fm3ropmma2sglfc9349jspqb1iutl3f7en1ckv500	
rulerpablo@Pablos-MacBook-Air ~ % polykey identities trust github.com:sujaycarkey
rulerpablo@Pablos-MacBook-Air ~ % polykey identities list              13:18:21
gestalt	
  actionsList	
  identities 	
    github.com:CryptoTotalWar	
  nodeIds    	
    vgijtpv0h8m1eajeir77g73muq88n5kj0413t6fjdqsv9kt8dq4pg	
gestalt	
  actionsList	scan,notify
  identities 	
    github.com:asuarezop	
  nodeIds    	
    v6o42hql07qr9bh25nosjrr46jloipd0ipbb42j0j98ktfndft3p0	
gestalt	
  actionsList	notify
  identities 	
    github.com:sujaycarkey	
  nodeIds    	
    v4c11qv5fpq2fm3ropmma2sglfc9349jspqb1iutl3f7en1ckv500	
rulerpablo@Pablos-MacBook-Air ~ % polykey vaults share my-software-project v4c11qv5fpq2fm3ropmma2sglfc9349jspqb1iutl3f7en1ckv500
ErrorRPCTimedOut: RPC has timed out
rulerpablo@Pablos-MacBook-Air ~ % polykey vaults share my-software-project v4c11qv5fpq2fm3ropmma2sglfc9349jspqb1iutl3f7en1ckv500
ErrorRPCTimedOut: RPC has timed out
rulerpablo@Pablos-MacBook-Air ~ % polykey nodes connections            13:20:56
host          	hostname	nodeIdEncoded                                        	port	timeout	usageCount
13.239.117.143	N/A     	vncm2mkk41vgp2fmplqiu1je7b2l3v6fhgltlqf5f3f85923ve0j0	1314	109870 	0
3.145.86.40   	N/A     	v6p14qcvvftnnscuavsehu37t22vfvnhse054pbkb3ehemmjsrdh0	1314	44011  	0
rulerpablo@Pablos-MacBook-Air ~ % polykey agent status                 13:22:23
status           	LIVE
pid              	71881
nodeId           	vgijtpv0h8m1eajeir77g73muq88n5kj0413t6fjdqsv9kt8dq4pg
clientHost       	::1
clientPort       	55681
agentHost        	::
agentPort        	51457
upTime           	1262
startTime        	1716318101
connectionsActive	2
nodesTotal       	7
version          	1.2.3-alpha.4-1-1
sourceVersion    	1.2.3-alpha.4
stateVersion     	1
networkVersion   	1
versionMetadata  	
  version          	0.3.1
  commitHash       	"09b655f012d17de5302d01fbc4c3e835ddf5dc81\n"
  libVersion       	1.2.3-alpha.4-1-1
  libSourceVersion 	1.2.3-alpha.4
  libStateVersion  	1
  libNetworkVersion	1
rulerpablo@Pablos-MacBook-Air ~ % polykey nodes ping v4c11qv5fpq2fm3ropmma2sglfc9349jspqb1iutl3f7en1ckv500
success	false
ErrorPolykeyCLINodePingFailed: No response received
rulerpablo@Pablos-MacBook-Air ~ % polykey agent stop                   13:24:15
Agent is already stopping
rulerpablo@Pablos-MacBook-Air ~ % polykey agent stop                   15:24:18
Agent is already stopping
rulerpablo@Pablos-MacBook-Air ~ % polykey agent stop                   15:24:43
Agent is already stopping
## Sujay's Demo Terminal

Sujay's Demo Terminal

image
image
image

Notify maintainers

@tegefaulkes
@amydevs
@brynblack

@CryptoTotalWar CryptoTotalWar added the bug Something isn't working label May 22, 2024
Copy link

linear bot commented May 22, 2024

@CryptoTotalWar
Copy link
Author

Also here is the stdrr from polykey agent start -v while I was experiencing the agent termination problem
sujay_demo_stderr.txt

@tegefaulkes
Copy link
Contributor

The first problem is very consistent with you not being able to form a connection between each other.

  1. You're both got connections to seed nodes, so you're entering the network just fine.
  2. On of you is on a public-ish library uni wifi which is a good chance it's very restrictive if not just symmetric NAT.
  3. Only one side needs to be behind a symmetric NAT for the hole punching procedure to not work.

All signs point to symmetric NAT being the issue. You need to rule that out before I can look into it being a problem. We'll need to implement relaying before Polykey can handle symmetric NATs.

I had a look and getting ErrorRPCTimedOut: RPC has timed out during vaults share should be fixed in the Polykey library. However I don't think we've done a Polykey-CLI release that includes that yet. I'll do a new release in a moment.

Your 3rd problem Minor Termination-Related Issues that are Glitchy, which is different enough it should be it's own issue. I'll take a look through the logs you provided and get back to you on that.

@tegefaulkes
Copy link
Contributor

Looking through the logs, looks like the hook for shutting down never triggered. The log cuts out mid line like the process was terminated. That must've been when you had to manually kill it.

This is really odd, we handle most signals to trigger stopping the agent. The only thing that should really kill it like this is a sigkill signal.

This might be a mac specific problem.

@CryptoTotalWar
Copy link
Author

@tegefaulkes I will complete the demo again with @sujaycarkey next time he's on a more laxed wifi so we can try again. You're probably right.

For referential information, do we have any backlog tickets addressing long-term plans for eventually trying to solve the issue of very restrictive NAT's?

On the topic of the the termination bugs, I will create a separate ticket to keep a handle of this and will try to recreate the bug and document the stderr.

I think given your investigation, it's safe to close this ticket but if the follow-up demo with @sujaycarkey fails then i can reopen.

@CMCDragonkai
Copy link
Member

For referential information, do we have any backlog tickets addressing long-term plans for eventually trying to solve the issue of very restrictive NAT's?

See MatrixAI/Polykey#713

You should just use linear search too to find answers quicker.

@tegefaulkes tegefaulkes closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2024
@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

No branches or pull requests

3 participants