-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XDR strings may not be valid utf8 #2022
Comments
only clients that try to render those strings should try to decode them as utf8. see stellar/stellar-core#2310 on background |
The XDR specification actually requires text to be encoded as ASCII but it seems we normalized on UTF-8, and so have others, e.g. the davecgh/go-xdr package we forked. @tamirms Do you know what in the Go SDK is causing the string to be changed? Go shouldn't make any changes to the bytes when converting a We can fix the Java SDK decoding and re-encoding error by not round-tripping the Could you post a full example demonstrating the issue? I can't figure out what all the imports are, e.g. |
So it's pretty clear that we've been accepting arbitrary bytes which means the The two viable solutions that I see here are:
I prefer the latter and believe the damage can be contained as most consumers use SDK utilities wrt memos, not the generated code (not sure though). SDK utils can be fixed when the xdr code is regenerated to allow for an (almost) smooth backwards compatible experience. |
Horizon actually removes all invalid UTF-8 byte sequences during ingestion because Postgres does not support inserting non-UTF-8 strings in The PR changing this is here: stellar-deprecated/horizon#353. I think it was done to support ingestion of txs with invalid UTF-8 strings because Postgres didn't accept it. Obviously not the perfect solution but I think the only alternative was a schema change devs wanted to avoid. |
@tomerweller I prefer solution 2. Solution 1 would require us to update https://github.com/stellar/xdrgen to fix how strings are encoded / decoded. Anything requiring us to update https://github.com/stellar/xdrgen is not ideal |
I also prefer solution 2. In the Go SDK, XDR string memos are abstracted:
So I think clients will be insulated from the change. |
@bartekn This sounds pretty bad. It means that Memo ingestion is lossy. |
Well, it means that for the (presumably small) subset of non-UTF8 memos, submitting something not technically supported by either the XDR spec or current SDK implementations leads to Horizon displaying them in a mangled way. The original data presumably lives untouched on the blockchain. I don't love it, but I'm also not sure it's very important? |
@tamirms Why are we avoiding changing xdrgen? Changing the generated code in the Java SDK means any future time we regenerate the code we lose these changes. RE: lightsail-network/java-stellar-sdk#259 (comment) |
@leighmcculloch did you see tomer's comment?
|
Just a misunderstanding on my part, resolved here: lightsail-network/java-stellar-sdk#259 (comment). |
I'm strongly opposed to modifying the XDR to convert all While technically RFC4506 says a string is ASCII bytes, I have never seen an implementation that enforced this. Moreover, NFS2 and NFS3 allow non-ASCII characters in filenames represented as strings, so we have a pretty solid precedent here. I have seen implementations that do not work with 0-valued bytes in strings, even though NUL is a valid ASCII character, just because C uses NUL-terminated strings, but this is clearly not correct. I think the current behavior is acceptable, namely to consider a string to be an array of bytes without specifying the particular encoding. The problems people are experiencing are really with language-specific representations of XDR, and should be fixed in the compiler and/or runtime. |
The Java SDK assumes that all memo texts must be valid UTF 8 strings. However, that assumption is not valid. It turns out that any sequence of bytes is allowed to appear in the memo field (see stellar/go#2022 ). Therefore, the correct representation for the memo field is a byte array.
Closing this issue because we opted not to change the type of all XDR string fields to |
XDR strings are defined as byte sequences where the first byte is the length of the string and the remaining bytes are the contents of the string.
https://docs.oracle.com/cd/E18752_01/html/816-1435/xdrproto-31244.html#xdrproto-38
We interpret XDR strings as unicode in horizon, the golang sdk, and the Java sdk. However, not all sequences of bytes are valid unicode strings.
This transaction https://horizon-testnet.stellar.org/transactions/213d52180f5e74fc4e4bc2c86740a44f98a40ab31e6718cc032e95f9084ae535 contains a memo field which is not a valid unicode string.
The problematic memo causes this issue in the Java SDK: lightsail-network/java-stellar-sdk#257
Also, the memo included in the horizon response does not match the memo you get when parsing the xdr envelope using the golang libraries:
Another issue with this transaction is that if you take the envelope XDR and decode it using the java SDK, then encode the result back to XDR, you will have a different XDR value than what you started out with:
Any ideas on what we can to deal with this problem?
The text was updated successfully, but these errors were encountered: