-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: UDP direct hole punching, high CPU usage and misc. #130
Conversation
After some testing with flamegraph, it turned out swagger-ui and a few loops were the cause of the idle CPU usage. The loops include multicast broadcasting and the peerDHT trying to connect to unconnected peers. The latter looks to be the biggest source of inefficiency so I think it should initially try to connect to those peers and if its not able to after maybe 3 tries, it stops trying unless the user tries to connect to it. The inspiration for this was libp2p's 'random walk' which allows a peer to maintain its position in the DHT by randomly querying peers every now and again. Another optimisation might be to increase the interval time |
Hmm is there a way to optimise the connection establishment itself? Does it involve alot of cryptography atm?
If it is an interval problem, we could do exponential backoff.
But another question is why are we trying to connect to unconnected peers atm?
…On 24 November 2020 5:44:29 pm AEDT, Robbie Cronin ***@***.***> wrote:
After some testing with flamegraph, it turned out swagger-ui and a few
loops were the cause of the idle CPU usage. The loops include multicast
broadcasting and the peerDHT trying to connect to unconnected peers.
The latter looks to be the biggest source of inefficiency so I think it
should initially try to connect to those peers and if its not able to
after maybe 3 tries, it stops trying unless the user tries to connect
to it. The inspiration for this was libp2p's 'random walk' which allows
a peer to maintain its position in the DHT by randomly querying peers
every now and again.
Another optimisation might be to increase the interval time
--
You are receiving this because your review was requested.
Reply to this email directly or view it on GitHub:
#130 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Will this PR also fix the high CPU usage? I didn't see an issue for that. |
Can you update the title of this PR since it seems to address alot more than UDP hole punching. |
BTW, high cpu usage when idling is critical to mobile deployment. @gideonairex We need ways of hooking in events that occur on mobile like 4g vs wifi, battery vs powered... etc. These would impact any background processing or discovery processes. CPU usage for js-polykey is really important to optimize! |
If @gideonairex has experience here, maybe he can help? |
Updated title to the two major things being addressed by this PR.
|
Cool so the plugins will be useful to trigger those events, but then js-polykey needs the hooks as "governor" to govern all of its loops so that the user which is Electron Polykey or Native Script Polykey can know how to optimize the usage of js-polykey. By "governor" I mean this: |
@robert-cronin ETA for this PR? |
@gideonairex you should track this PR as well, looks like some things coming here will result in features in the frontend. |
I am hoping to be finished by Friday, still need to schedule a meeting with @nzhang-zh to ensure the fixes to the hole punching are working correctly |
Went a bit over schedule, the UDP hole punching was a little more complicated than anticipated but is finally working over the public internet. To keep the code review more manageable, I will be breaking this PR up so that the already solved issues can be reviewed and merged. Issues #113, #128 and #131 will be split into their own PR for social discoverey and other UI/UX issues |
@gideonairex Can you cross review this with @robert-cronin. I will need you guys to review each others code in accordance with coding guidelines in the orientation repository as well and make discussions available here too. I won't always have time to review all the code. |
@robert-cronin Make sure to get See: https://gitlab.com/MatrixAI/open-source/js-polykey/-/pipelines/229949699 It's currently failing the build. Also squash the commits into semantic commits. |
BTW are the tests intended to work via the CI? Seems like it times out. |
There is a discrepancy between the gitlab-runner on my local machine and the one actually on gitlab and I haven't managed to figure out why yet. I can have a look soon in the next PR hopefully. |
So in the cleanup for this PR, I remembered a bug in the connection logic that I forgot to deal with; if the direct connection tries and fails (in the case of public to private connections) then the rest of the methods cannot be performed, I was getting around the bug before by commenting out the direct connection method so it would go directly to the hole punching. The bug seems to be something to do with my custom promiseAny utility, it almost seems to be blocking the event loop for 10 seconds which is the default timeout set for connectDirectly method. I have a couple of things to try; first is the try to increase the timeout for the connectFirstChannel method so all options can be tried or reduce the timeout for the connectDirectly method so the other methods can have their try. Even though Node.js is async/await, the call stack is single threaded so it still has to empty before the event loop can pop the next task onto it (unless we use webworkers, but that seems a little overkill). |
04eac31
to
e3d3d22
Compare
@robert-cronin is this still relevant? |
Development occurs on gitlab now. |
This PR is specifically intended to fix a bug with UDP hole punching so integration with Matrix-Agent can continue but will also fix some other important bugs as well:
pk secrets env
#129