-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SmallAtom
optimization
#372
Conversation
Pull Request Test Coverage Report for Build 7836861233
💛 - Coveralls |
This is interesting, do the bench times increase in cases where most of the atoms are large due to the extra branching to determine the type of the NodePtr? I'm also concerned about returning a reference to a temporary value with But otherwise I like this a lot |
I think there's probably more noise in those timings than one might imagine. I just basically ran it once. But, yeah, there are a few cases where there's more work done. For example, when asking whether a node is a There's also some extra work when allocating a new number. There's a check to see if it could be stored as a small int, and if so, it is.
Yeah, I'm not super happy about this. I think a ring buffer of 16 is more than enough, but maybe it's better to overshoot by more and make it 64 or so. In the long run I would like to phase out this part of the API though, in favor of This is still work in progress, as you can see from the TODO comments and missing tests |
e1913d8
to
3fee725
Compare
If I understand correctly, this is actually a |
This is some good work. The numbers in particular look fantastic. I do worry about subtle consensus changes and wonder if adding more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good enough for now on my end, I hope to improve the API a bit to ideally get rid of unsafe
and needing to use visitors, but I don't want to hold it back on that
@Rigidity pointed out that |
I applied the suggestion from @Rigidity and made The change since last version: https://github.com/Chia-Network/clvm_rs/compare/34bbdf08f7d0c59fa3312b66aa1d3173839b6866..e90cd1279ae60da5f21953486c3a454f583ca217 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for now
…e of running programs)
These commits are best reviewed one at a time.
purpose
This patch introduces a more compact representation of small atoms. A
NodePtr
is 32 bits, 6 bits identifies the kind of node it points to (Atom or Pair), 26 bits is an index into the vector of atoms or pairs.This patch adds a 3rd kind of node,
SmallAtom
, where the value of the atom is stored in the low 26 bits directly. The benefits are:A very large number of atoms are small. In a typical CLVM program for instance, all op-codes and environment paths are small.
format
SmallAtom
only support atoms that are:The 26 bits are interpreted as an unsigned integer.
leading zeros
Since our canonical integer representation requires leading zero bytes for certain positive integers (where the most significant bit in the following byte is set), the leading zero is implied by
SmallAtom
nodes. The functionlen_for_value()
inallocator.rs
implements this behavior. It returns the length an atom would have had, if it has been allocated on the heap.Allocator::nil(), Allocator::one()
The Allocator always allocate two nodes on construction,
nil
andone
. With this patch, we no longer need to allocate these in the atom list nor on the heap. We can just constructNodePtr
representing those values directly, asSmallAtom
. In order to still enforce the same limit on the number of atoms that are allowed to be allocated by a program, we subtract 2 from the constant,MAX_NUM_ATOMS
.MAX_NUM_ATOMS
We have a (consensus critical) rule on how many atoms are allowed to be allocated by a program. Since
SmallAtom
s aren't allocated in theatom_vec
list, we need to keep a count of them separately, to ensure the limit is identical as prior to this optimization. We keep a countersmall_atoms
counting the number of small atoms we've allocated. Whenever we check theMAX_NUM_ATOMS
limit, we include this counter.atom() -> &[u8]
A major challenge with this optimization is in the Allocator's interface. It lets you ask for a pointer to the buffer of any atom. Small atoms don't have a buffer since they're not allocated on the heap. In order to stick to the current interface of the allocator, this function is preserved, using a pool of small buffers to return a pointer to.
We have a ring buffer,
temp_vec
, with a cursor to the next free slow,temp_idx
. When theatom()
is requested for a node that is aSmallAtom
, we grab the next available slot in the ring buffer, increment the cursor, print the integer into the buffer and returns a reference to it.This relies on the return values from
atom()
being short lived, and that we never, at one point, keep more than 64 of these references around.visit_node()
In order to optimize functions using integers, there's a new function on
Allocator
calledvisit_node()
, which calls a callback with the value of the node pass into it. This means we can avoid conversions from small integers toBigInt
for example.benchmarks
The results of the benchmarks in this repository follow. This is the output from
critcmp
:RPi5
Perhaps the more interesting benchmarks are the
run_generator
stress tests in thechia_rs
repository.I ran it on Mac OS (M1) and a RaspberryPi 5 machine.
My main concern was to make the worst case more palatable. Among our stress tests, that's
deep-recursion-plus
andduplicate-coin-announce
.RPi5
Mac OS (M1)