-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ScObject with ScOption. Unify everything under ScVal. #64
Conversation
I would not say "never" to this, but I'm generally fairly opposed. Concerning "getting rid of
Concerning changing |
I think this proposal contains a couple of things we can discuss and decide on independently:
|
Except if you do that, now instead of only one Option edge in the type graph, you have an Option on any edge to a type that contains a nested Val. So like Val<Option> and Val<Option>. Because of the thing where xdrpp needs an option interposed in a recursive type cycle. |
I think that's okay. Most users don't experience any problems because of that, at least not problems that I think are worse than the nesting creates. If we want to address that problem we should be addressing it in xdrpp. I don't think doing this needs to be contingent on addressing that problem though. |
Ok. I'm .. still fairly opposed even to this step. I think it's a mistake to hide something that's fundamental to the way the system works from users. You say users don't need to know, but I think that's an overstatement. End-end users operating contracts from wallets and GUIs won't see either way. Users who are writing contracts do need to know, it's part of how the system works and will be relevant when debugging. The only users who might not need to know are dapp devs who are only working in JS and not writing contract code. I don't think it's right to spend what will likely be several weeks of work moving things around and rewriting in order to optimize for them at the expense of all contract devs, and the maintainers of soroban itself (the distinction absolutely runs through the implementation). |
The SDK tries pretty hard, and does a pretty decent job of making it unnecessary to know this. For sure if you're debugging a Vec or a Map, all you'll see is an integer and probably be confused. But if you I think it will be likely that there will be a subset of contract developers who understand Object exists – the advanced / experts – and there will be the larger set of folks who just haven't ever needed to know, and may not ever need to know. I think there's a benefit here, but it's a benefit with tradeoffs. Another possibility is we do build a usability layer into SDKs so that people aren't fiddling with these details, but that is also more code to build and maintain. |
Nice breakdown between the two discussion, @leighmcculloch. Thanks. After reflection, I think 1 (flattened obj structure) would be a nice-to-have, and I'm not sure about 2 (option) anymore. Related feedback from the hackathon (stellar/stellar-docs#485), so I'm clearly not the only one confused by the mismatch between the xdr and the rust type systems. |
SCV_U128 = 4, | ||
SCV_I128 = 5, | ||
|
||
// Other Primitives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spacing out numbers like this isn't ideal. We reuse these numbers as object code tags which means they get encoded into the instruction stream as literals. Bigger numbers => larger codesize.
case SCV_ACCOUNT_ID: | ||
AccountID accountID; | ||
case SCV_VEC: | ||
SCVec vec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCVec and SCMap being recursive types here have to be SCVec* amd SCMap*
Another difficulty (or possibly: opportunity) with this is that The Current Plan includes adding subtypes to objects: stellar/rs-soroban-env#583 If we were to not have "object" represented in the XDR anymore, we'd either add subtypes to values, or add them case-by-case to the value cases that are object-like / want-to-have-subtypes. Or literally duplicate those cases for each subtype we have in mind at the representation level, eg have a separate ScVal::Text and ScVal::Bytes that only differ in interpretation, both have the same representation / be able to be passed to host functions expecting that representation. One thing this conversation is making me wonder is the extent to which we want the XDR to hide representation differences. The whole point of having subtypes is to allow a single representation (eg. "bytes") to have multiple semantic interpretations (eg. "text", "wasm", etc.) Currently such representation questions are exposed in the XDR as something the user gets to see / is forced to care about. An alternative (as you seem to be suggesting here) is that the difference gets buried in type repertoire of the Env interface. As an illustrative example of this: the value type currently has symbol as a separate type from bytes (which we're intending to enhance with a text subtype). This has some advantages (the user has a clear sense of which XDR representations will map to which sorts of cheaper-or-more-expensive runtime representations) but also disadvantages (the user has to care about the boundary between 10 character strings of the form a-zA-Z0-9 and all other strings; they might prefer to just let the runtime treat the special case as an invisible optimization). We could theoretically lean hard into "hide representation questions from users", possibly even making the idea of subtypes invisible to the user (eg. we have separate Value-level variants for String, Wasm, Timestamp, etc. -- each thing we treat as a mere subtype-of-some-representation-type at the runtime level). @paulbellamy input appreciated -- this subject basically touches everything in the ecosystem any time we change it, which is why it's top of my list, should be decided ASAP. |
Is there an example of how subtypes would work in the XDR? |
What I was hinting at above is there's kinda two separate ways to approach it. One is to keep things "factored" and put the subtype values as explicit fields in each union case. So in today's code this version looks like:
And then the other way is to just duplicate the union-arm for every case we'd have a subtype for, so we get:
This has advantages and disadvantages. The main advantage is conceptual simplicity -- if we do this with your suggestion of collapsing object and value, there's still just one conceptual level of type information in the union, which makes life simpler for users. A secondary advantage is it's smaller in terms of encoded XDR. XDR will chew up 32 bits for the union switch value no matter what -- it does no bit-packing -- so we might as well use that bit-space to its fullest extent. I think there are two disadvantages and maybe we can live with them. The XDR definition gets potentially a bit longer looking due to all the duplication. And future extension winds up having "subtypes" somewhat out-of-order in the list -- like if we add a subtype to a union-arm that's conceptually in the middle of the list, we have to extend it with the last-assigned enum value (unless we do the thing your PR does with reserving some space after each code. Maybe both of these are acceptable though. For example, if we lose 1 extra character from (The point of subtypes existing in the first place is that some types have special handling that's different enough to warrant being described a special way -- eg. formatting text for display or such -- but are still representationally identical to their unspecialized variants, and should be allowed as arguments to all the unspecialized functions, eg. the bytes-length function should tell you the length of any bytes object, whatever flavor) |
(Also worth noting: if we split the Env-interface tag space from the SCVal tag code space -- say to allow hiding all the Env-optimized forms like static slices and maybe even symbols from the user -- we'll still need to reserve those values from the SCVal tag code space. So .. it's likely to look a little weird there anyways.) |
(Also worth noting: |
After discussion this morning we've decided to go with (or at least try going with) a version of this. And actually go quite a bit further in overhauling things:
So I spent some of the afternoon sketching the result of all of this, and came up with something roughly like so: // We fix a maximum of 128 value types in the system for two reasons: we want to
// keep the codes relatively small (<= 8 bits) when bit-packing values into a
// u64 at the environment interface level, so that we keep many bits for
// payloads (small strings, small numeric values, slices, object handles); and
// then we actually want to go one step further and ensure (for code-size) that
// our codes fit in a single ULEB128-code byte, which means we can only use 7
// bits.
//
// We also reserve several type codes from this space because we want to _reuse_
// the SCValType codes at the environment interface level (or at least not
// exceed its number-space) but there are more types at that level, assigned to
// optimizations/special case representations of values abstract at this level.
enum SCValType
{
SCV_BOOL = 0,
// Code 1 reserved: at the environment level we treat bool true and
// false as separate _types_, with false=0, true=1, because these coincide
// with the u32 values WASM uses for bools, and it reduces codesize.
// The null/void/unit/none type: indicates the absence of other data.
SCV_NULL = 2,
// 32 bits is the smallest type in WASM or XDR; no need for u8/u16.
SCV_U32 = 3,
SCV_I32 = 4,
// 64 bits is naturally supported by both WASM and XDR also.
SCV_U64 = 5,
SCV_I64 = 6,
SCV_U64_TIMEPOINT = 7,
SCV_U64_DURATION = 8,
// Codes 9, 10 reserved for small-value versions of u64/i64.
// 128 bits is naturally supported by Rust and we use it for Soroban
// fixed-point arithmetic prices / balances / similar "quantities".
// These are represented in XDR as a pair of 2 u64s, unlike {u,i}256
// which is represented as an array of 32 bytes.
SCV_U128 = 11,
SCV_I128 = 12,
// Codes 13, 14 are reserved for small-value versions of u128/i128
// 256 bits is the size of sha256 output, ed25519 keys, and the EVM machine
// word, so for interop use we include this even though it requires a small
// amount of Rust guest and/or host library code.
SCV_U256 = 15,
SCV_I256 = 16,
// Codes 17, 18 reserved for small-value versions of u256/i256
// TODO: possibly allocate subtypes of i64, i128 and/or u256 for
// fixed-precision with a specific number of decimals.
SCV_BYTES_DATA = 19,
SCV_BYTES_TEXT = 20,
SCV_BYTES_WASM_V0 = 21,
SCV_BYTES_XDR_SCVAL_V0 = 22,
// Codes 23, 24 reserved for small-array immediate SCV_BYTES_DATA and TEXT
// (what was previously "symbol")
// Codes 25, 26, 27, 28 reserved for constant-slice versions of SCV_BYTES_*,
// specified by a [contractID:12][offset:22][length:22] = 56 bit payload.
SCV_ERROR_SYSTEM = 29,
SCV_ERROR_CUSTOM = 30,
SCV_VEC = 31,
SCV_MAP = 32,
// Codes 33, 34 reserved for constant-slice versions of SCV_VEC and MAP.
SCV_CONTRACT_CODE_REF_WASM_V0 = 35,
SCV_CONTRACT_CODE_TOKEN_V0 = 36,
SCV_ACCOUNT_ID = 37,
SCV_LEDGER_KEY_CONTRACT_CODE = 38,
}; This is not a final proposal, just a step towards one to illustrate the collective set of ideas. I'm actually fairly pleased with the way this decouples representation optimizations from abstract values; for example we can do small-object and constant-slice optimization on many different types. Feedback welcome! Especially on the question of whether we ought to subtype the fixed-point-7-decimals or ERC-like 18-decimals cases (there's even potentially enough codespace here to chew up a pile of codes just specifying 0..20 decimals, though I tend to think this is pointless). The recent emphasis on the SAC suggests that the i128-with-arbitrary-decimals design-point might itself not be perfect; stellar assets having 64-bits-with-7-decimals quantities everywhere. |
Overall, looks pretty nice.
Edit edit: Looked at the code, and we do have an scval type for that already, don't we... |
A few minor questions:
|
This looks great. One thing (sorry if this is obvious): Is the string encoding fixed? This might be less relevant at the XDR level than at the SDK level. |
@deian I was planning on going with UTF-8 here but @MonsieurNicolas has mentioned in the past it might make sense to include a built-in case for a different encoding (and of course such subtyping is just a convenience for display -- you can always put any encoding you want into a plain BYTES_DATA object) |
Closed in favor of #70 |
While investigating stellar/stellar-cli#330, it turns out there's currently no way to explicitly represent
Option<Symbol>
(orOption<any other primitive>
) in the xdr. Instead any ScVal might be an ScStatic::Void, so everything is an optional.The confusion I had was because ScObject
is doing double-dutygets misused as an ‘Optional’ type. ScObject is also a weird performance-abstraction-leak thing.I’m proposing: