-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider alternative forms for Symbol #463
Comments
One option I've heard discussed is something similar to how Ethereum uses the first 4-bytes of a hash of function names. We could make This would work in contexts where we could easily detect collisions, such as function names, but would be harder enforce in broader contexts like type names, enum variant names, etc. |
Another option I've heard discussed (cc @graydon?) is that we store the full length strings in a section of the WASM and reference them somehow. The host would be able to use those references to load the full string out of the data section of the WASM so we would only ever be transmitting to/from the host a handle. |
Thanks. It seems like a lot of developers are running into 10-char limit issue, based on discord questions |
I think there are several viable options here but they all have certain challenges:
None of these are great but as I say I think I'd lean towards case 4 if the various wrinkles involved are all ironable-out (and if 16 or 20 chars is "enough"). WDYT? |
Investigating a little more, it seems likely rust is using an ABI similar-to (or identical-to) https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md which .. we could probably adapt to on the dispatch-function side. One interesting thing that occurs to me here -- especially around trying to recover the codesize difference and avoid passing around huge values with only a few bits set -- is that .. it'd potentially open up a bit of design-space for, say, a 128-bit or even 256-bit |
I think 20 characters make a huge difference over 10. I agree with all the downsides in every approach though, and I'm not convinced the solutions we know of are a good tradeoff to living with the symbols we have. |
Following this along: we could make |
This model sounds compelling. Could these values be used everywhere Symbols could be used today, or are we talking only for function inputs? |
As a hold over / alternative to making a change to RawVal/Symbol, I've opened this change that makes the error people see when their Symbol is too big, easier to digest: stellar/rs-soroban-sdk#655. |
@leighmcculloch it'd work .. I think pretty much everywhere. It might actually be hard to pass data originating in the host into the guest, like for symbols or hashes as args in incoming contract-invocation calls, since we'd need to allocate some guest memory to deposit the incoming Anyway, in general "small structs that are multiple words long" are a normal thing in Rust, in either the Rust spoken in the world of the host or the world of the guest. The awkward interface we're dealing with in soroban is just "the wasm guest-to-host and host-to-guest invocation interfaces, and whatever ABI / calling conventions make of them". We're currently dealing with that interface as a thing that's easiest to traverse with "a sequence of repr(transparent) u64 rust values" because .. that's easy to predict the mapping of / bidirectionally map: each u64 word in the Rust arg list maps 1:1 to a u64 word in the wasm invocation interfaces. A more complex structure is possible if (big if) we can figure out what Rust's going to generate on the guest side and intercept-and-unpack it in the dispatch functions on the guest side. |
(Update: yeah, it looks like we can just push the guest stack pointer from the host, so we can put things on its linear memory before calling it, so it should basically work everywhere) |
It would also generally increase guest codesize to be throwing around lots of 256-bit / 4-word values in the guest. In the host it probably won't register much, but I'd expect a it could be problematic in the guest if a large majority of values of of this sort. Of course, if most values are still |
Given how much better symbol too long errors are since stellar/rs-soroban-sdk#655 I don't think we should rush the solution for this. Maybe this is a good thing to explore after we get FutureNet deployed. |
I see a lot of UDT and rich types in our examples. We're heavily using BigInt, Map, and Vec types to the point where any optimizations we may have designed for i64, u32, or i32 seem somewhat meaningless. The only small type we use frequently is u32 and that's because we use it internally in host functions like Given that, I think may we should rethink the 7bit tag space. The 1-bit u63 might be better spent on something else. The u32 bit, i32 bit, and the BitSet bit, are also seeming to be relatively unused, and these types could all go. That's 4 bits of the 7 bits I don't think we're leveraging enough to warrant them. Would it be a better tradeoff to have that BigRawVal more often, or host fns more often, because right now we're doing a lot of host fns instead of BigRawVal. |
Nit: it's not a "7-bit" tag-space, it's a 1 + 3 = 4-bit tag-space, with 1 primary case of u63/others then 8 other-cases switched on the next 3 bits. You're saying we can probably part with some of those cases, but it's only really useful to free up a full bit at a time, i.e. part with powers-of-2 worth. If there are 4 cases to lose, we could free up a bit and move to 2-bit other-tags say, but .. eh .. 1 extra bit doesn't win us anything in the data payloads. Anyway, I agree all this is post-FutureNet. It's something I wanted to spend a little time revising my understanding of, and I spent that time yesterday, I'm happy to set it down until late-fall / pre-finalization. This should all be fairly invisible to users, aside from "some codesize improvements, fewer size restrictions on symbols, and maybe fewer random unused datatypes". (Out of all of this I'm most interested in the various ways large-but-not-absurd numbers -- u256 / literal-32-byte-binary -- get used, and how people would want to use them if we could anticipate their preferences / support the uses adequately) |
See #584 |
As of #682 this is done, we support up to 32-char symbols (and that limit is somewhat arbitrary, just set to "something reasonable" for keys/topics/function names and local u8 buffers to hold them in no-alloc builds) |
Symbol
is a 60-bit encoding of a 10 character string with a limited character set ofa-zA-Z0-9_
.The value is stored in a u64 and so is efficient to pass around.
We use it in a lot of places:
However, the 10 character limit is at times annoying. It can sometimes cause us to write code that looks like a human error. e.g.
contrct_id
.We should consider other ways we could encode strings into Symbol's that would give us longer strings, more descriptive function names, variants, type names, etc.
cc @tomerweller @graydon
The text was updated successfully, but these errors were encountered: