-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BPF program serialization/deserialization may no longer be necessary #14523
Comments
Here is the plan:
|
You might know this already. But mem copies in the transaction execution path is really hurting performance. I found this by casually Currently, supposedly large account data consumes more cpu cycles just for moving/copying them. And, interpreted bpf execution only consumes ~6% Note that _blake3_hash_many_avx512 will be Account's DB problem.. From #14596 (comment):
... So, I'm really looking forward to the improvements in this area. :) |
I would hope this is largely addressed by the copy on write account data change that went in a few weeks ago. @ryoqun , if it is possible for you to run this again, that would be swell. or better, if you want to point me to how to run like this. Do you have to build differently? I've not yet run perf on our stack. @ryoqun |
I think he used this to connect while it is already running: If you want to start recording from the beginning:
or Intel:
can make a flame graph: Has a hardcoded path of https://github.com/brendangregg/FlameGraph in |
here's updated
|
I suspect blake3_hash disappearing has at least 2 known causes:
|
I suspect |
latest_slot is being attacked by an already-submitted pr described in this issue. @ryoqun ran into the assert gathering these metrics today and had to remove the pr that should have greatly reduced this %. |
@Lichtso Hmm, unfortunately I don't have precise build info. I only compared between the tip of master around Jan 21 and one around Apr 16. This is very informal test so I don't want to make you worry too much. :) If you're interested, I can run this test between tip of master and v1.5.18. Also, the on-chain tx composition can change the perf metrics a lot between the time. So, I'm just looking this for fun and to ensure nothing so obviously bad is happening on tip of master. Anyway, I'm hoping your massive work #15410 will improve this perf numbers a lot. :) |
Thx, but now that I looked at the data again I think
Also, the refactoring I am working on shouldn't change much about the performance if at all. |
To summarize how this progressed during the past half year:
|
@jackcmay, @jeffwashington, @aeyakovenko and I just had a recap discussion about all the related / connected issues and came up with the following design goals:
About half of these ideas existed already and I linked the tracking IDs. I will try to come up with a concrete proposal for how the new interface should look like by next week and post it here as request for comments. |
In addition, I think we can also address the following issue as well when we roll out the new loader that doesn't rely on serialization: |
Now that programs no longer support returning errors "gracefully" (#14500), the way data flows to and through a cross-program invocation call chain also changes and may no longer require serialization or extra copies.
There is no longer any roll-back or intermediate discarding of data. Account data now flows through the call chain linearly from one program to the next, either to the end of the chain upon success or up until an error where the data is discarded. Because of this, rather than copying data in and out of each program's environment, the data can be passed (and translated) via pointers. Each program in the call chain incrementally operates on the data, and the runtime verifies and snapshots in-between and at the end.
Doing so provides the following advantages:
But comes with the following challenges:
repr(C)
). We already have this issue when passingInstruction
andAccountIInfo
viaInvoke
. Newrepr(C)
types should be used for all data passed into and out of the programs. This means the current native entrypoint (usingKeyedAccounts
is not well suited.repr(C)
we should probably make that change all the way down toaccounts
so that we don't incur any extra copies of the account data then are absolutely necessary.Overall I think this proposal does a lot for both performance, stability, and clarity of the data being passed to and between programs.
The text was updated successfully, but these errors were encountered: