Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Tunnel crypto by re-implementing IPv8 crypto in "Rust" programming language #4567

Closed
synctext opened this issue Jun 14, 2019 · 23 comments · Fixed by #7908
Closed

Faster Tunnel crypto by re-implementing IPv8 crypto in "Rust" programming language #4567

synctext opened this issue Jun 14, 2019 · 23 comments · Fixed by #7908

Comments

@synctext
Copy link
Member

synctext commented Jun 14, 2019

This is a long-term issue for re-implementing our crypto tunnel core for raw speed, zero-copy protocol stack, and usage of work stealing thread pool. Everything in Rust. This means leaving our ideally-suitable-for-rapid-prototyping Python stack, making it less easy to modify and freezing the wire format and behavior.

This issue partly addresses #1 issue Tribler anonymous downloads are fast and secure.

This is an honor students project at TUDelft, complimentary to ongoing tunnel re-factoring and tweaking like: #4459.

Current code repo: https://github.com/ip-v8/rust-ipv8

@jdonszelmann
Copy link
Member

@synctext could i be assigned too? thanks!

@jdonszelmann
Copy link
Member

also, could the main repo link be edited to this? https://github.com/ip-v8/rust-ipv8

@synctext synctext assigned jdonszelmann and unassigned NULLx76 and dsluijk Jun 15, 2019
@jdonszelmann
Copy link
Member

the other assignees were correct though :)

@devos50
Copy link
Contributor

devos50 commented Jun 15, 2019

also, could the main repo link be edited to this? https://github.com/ip-v8/rust-ipv8

I edited the link in the first post.

@devos50
Copy link
Contributor

devos50 commented Jun 15, 2019

Also, not entirely related to this ticket, but could you guys let me know when the message serialization method is working? I have an experiment where I test the market community under a high system load and it seems that one of the bottlenecks is IPv8 message serialization. If there is something faster, I can integrate it into this experiment and 1) check the speed improvement and 2) check your implementation for correctness. 👍

@jdonszelmann
Copy link
Member

Yes of course. We use the Serde library for it which provides zero copy serialization, extendable to custom objects. In many cases we could basically use the default serialization strategy as python does this too with the struct library. We provide some "atoms" which correspond to python types which we guarantee to be serialized in a certain way. An example of this is the Varlen16 struct which guarantees to serialize to 2 bytes of length + data of max 65536 bytes. There is also a Varlen8, Varlen32 and Varlen64 (which arent strictly necessary as pyipv8 doesnt use them, but adding them was really easy and might prove useful). Similarly we have a Bits struct which serializes to an u8. etc. For these we implemented a custom serde serializer process. As long as you use these types in your structs instead of the builtin rust types you are guaranteed that python is able to interpret them. Serde obviously does all the hard work.

Now about the coupling to your python project. As we haven't even started to make the python FFI you would have to do this yourself. This won't be too easy and it also is what we are starting on next week. So to use it in the market community will be something you have to figure out on your own mostly. (or you wait a few more weeks)

However, verifying our serializer would be highly appreciated so go ahead! We already have a lot of testcases but more never hurts. More detailed explanations of the serializer can also be found in comments around it as we document our code pretty well in my opinion. We also generate a documentation page which can be found here.

@jdonszelmann
Copy link
Member

jdonszelmann commented Jun 15, 2019

Update about the project itself: we now run our tests on windows and mac too which might be useful to know. In the past only linux testing was performed

@devos50
Copy link
Contributor

devos50 commented Jun 15, 2019

@jonay2000 thanks for the update! It's not an urgent need but it would allow me to squeeze out more throughput under high load 👍

@jdonszelmann
Copy link
Member

Yes, that would be great. Honestly, that's why we make rust-ipv8. To remove some bottlenecks in certain communities. We hope to have a working FFI to python around the end of august.

@jdonszelmann
Copy link
Member

jdonszelmann commented Jun 15, 2019

We have done some benchmarks of the system and they achieve some very promising results:

The deserializing and signature verifying of a packet takes 57 us (microseconds) per packet.
This packet was a relatively small packet which consisted of a header, BinMemberAuthenticationPayload, TimeDistributionPayload and an IntroductionRequestPayload (basically an introduction request). Real world packets will often be larger, and as the bottleneck right now is the verification process this will only improve speeds.

Taking this 57us time- taken as an average over about 500,000 runs, running on one thread of one core of a desktop cpu with a clockspeed of 3.5 Ghz - this means we can process around 3.5 megabytes a second per cpu core. (almost twice that with hyperthreading) . This all gives you a theoretical througput of 2.8 terabytes per day on just one core of one cpu.

Now there will be some overhead of the communities themselves - we know that - but even if you halve this speed this will be a great improvement of the system in place right now. However we don't think the impact of communties will be even close to this much as signature verification is very clearly the bottleneck of the system.

Another idea we have, which can be implemented in the future, is batch verification. At this moment the verification of one ED25519 signature takes 273364 instructions on a modern x86 cpu. When we wait with verification until multiple packets have come in one could theoretically half this number. Although we haven't looked into how we could do this, nor have we planned to do this any time soon, this could greatly improve speeds. (note: 64 ed255129 signatures in a batch takes 134000 instructions per signature checked)

We have even noticed that we are (marignally) faster than some quite optimized alternatives to our verification process. We expect this to be due to the link time optimization we do which does drastically increase compile time (double to triple) but yields a speed bonus of around 1.5% for crypto and 50% for serialization/deserialization. (though the ser/de is still way faster than the crypto)

More optimizing will be done and more benchmarks will be taken for sure.

P.S. note that we do true multithreading so all speeds stated will roughly scale with core count.

@synctext
Copy link
Member Author

Great progress. Please consider integrating as early as possible with Python and our complete exe build process. That has been identified as the cardinal pain point. For instance, merely owning the udp socket and acting as a "Rust proxy" with the rest in Python. Can be parallel development track.

@qstokkink
Copy link
Contributor

When we wait with verification until multiple packets have come in one could theoretically half this number.

I would advise against any buffer in a networking library. You should treat packets like hot potatoes: never hold on to them. In this case, it's better to be a bit less efficient.

Two examples: (1) overzealous use of buffers and batching led to packets having a propagation time of up to 10 seconds in Dispersy and (2) a bit more generally, the problem of bufferbloat.

@devos50
Copy link
Contributor

devos50 commented Jun 16, 2019

Also, the batching mechanism in Dispersy made it a nightmare for developers to debug tests, since it was very hard to see whether a message is/was in the buffer, is being processed or has been processed. It was also one of the reasons why these individual tests could take up to 10 seconds to complete.

I agree with @qstokkink here, we learned from buffering messages in Dispersy and it is not a mechanism I would like to see back, even if it (marginally) improves performance.

@jdonszelmann
Copy link
Member

Alright that's very clear. No batch processing. Thanks for that feedback. We wouldn't have done it for months anyways but now we won't even research the idea.

@ichorid
Copy link
Contributor

ichorid commented Jun 17, 2019

@jonay2000 , could you guys please measure the performance of processing Tunnel Community AES-GCM encrypted packets? That is where our real bottleneck is.

@jdonszelmann
Copy link
Member

will do, probably done this week

@jdonszelmann
Copy link
Member

@ichorid Does tribler/ipv8 use AES-128-GCM or AES-256-GCM?

@ichorid
Copy link
Contributor

ichorid commented Jun 20, 2019

@jonay2000 , I guess it's 128. You better ask @egbertbouman about this to be sure.
However, you can produce tests for both. Or you can try out both and see which one fails to be decrypted by Tribler tunnels crypto 😉

@egbertbouman
Copy link
Member

@ichorid You're right, it's AES-128-GCM.

@jdonszelmann
Copy link
Member

Thank you both!

@synctext
Copy link
Member Author

synctext commented Oct 2, 2019

ToDo, schedule update meeting

@devos50
Copy link
Contributor

devos50 commented Apr 6, 2020

Due to inactivity on this issue, I will move it to backlog.

@synctext
Copy link
Member Author

synctext commented Apr 15, 2024

160 Mbit/sec download speed. Amazing work on Experimental release 😲 🚀 🎉
Tribler_experimental_20MBYTE_tunnels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

9 participants