XDR strings may not be valid utf8 #2022

tamirms · 2019-12-06T20:55:42Z

XDR strings are defined as byte sequences where the first byte is the length of the string and the remaining bytes are the contents of the string.

https://docs.oracle.com/cd/E18752_01/html/816-1435/xdrproto-31244.html#xdrproto-38

We interpret XDR strings as unicode in horizon, the golang sdk, and the Java sdk. However, not all sequences of bytes are valid unicode strings.

This transaction https://horizon-testnet.stellar.org/transactions/213d52180f5e74fc4e4bc2c86740a44f98a40ab31e6718cc032e95f9084ae535 contains a memo field which is not a valid unicode string.

The problematic memo causes this issue in the Java SDK: lightsail-network/java-stellar-sdk#257

Also, the memo included in the horizon response does not match the memo you get when parsing the xdr envelope using the golang libraries:

func TestMemo(t *testing.T) {
	envelopeXDR := "AAAAAM6jLgjKjuXxWkir4M7v0NqoOfODXcFnn6AGlP+d4RxAAAAAZAAIiE4AAAABAAAAAAAAAAEAAAAcyKMl+WDSzuttWkF2DvzKAkkEqeSZ4cZihjGJEAAAAAEAAAAAAAAAAQAAAAAgECmBaDwiRPE1z2vAE36J+45toU/ZxdvpR38tc0HvmgAAAAAAAAAAAJiWgAAAAAAAAAABneEcQAAAAECeXDKebJoAbST1T2AbDBui9K0TbSM8sfbhXUAZ2ROAoCRs5cG1pRvY+ityyPWFEKPd7+3qEupavkAZ/+L7/28G"
	envelope := xdr.TransactionEnvelope{}
	err := xdr.SafeUnmarshalBase64(envelopeXDR, &envelope)
	if err != nil {
		t.Fatal(err)
	}

	memo, _ := envelope.Tx.Memo.GetText()
	response, err := http.DefaultClient.Get(
		"https://horizon-testnet.stellar.org/transactions/213d52180f5e74fc4e4bc2c86740a44f98a40ab31e6718cc032e95f9084ae535",
	)
	if err != nil {
		t.Fatal(err)
	}
	defer response.Body.Close()
	bodyBytes, err := ioutil.ReadAll(response.Body)
	if err != nil {
		t.Fatal(err)
	}
	tx := protocol.Transaction{}
	err = json.Unmarshal(bodyBytes, &tx)
	if err != nil {
		t.Fatal(err)
	}
	// the test fails here
	if tx.Memo != memo {
		t.Fatalf("'%s' != '%s'", tx.Memo, memo)
	}
}

Another issue with this transaction is that if you take the envelope XDR and decode it using the java SDK, then encode the result back to XDR, you will have a different XDR value than what you started out with:

    @Test
    public void testRoundtrip() throws IOException {
        String txBody = "AAAAAM6jLgjKjuXxWkir4M7v0NqoOfODXcFnn6AGlP+d4RxAAAAAZAAIiE4AAAABAAAAAAAAAAEAAAAcyKMl+WDSzuttWkF2DvzKAkkEqeSZ4cZihjGJEAAAAAEAAAAAAAAAAQAAAAAgECmBaDwiRPE1z2vAE36J+45toU/ZxdvpR38tc0HvmgAAAAAAAAAAAJiWgAAAAAAAAAABneEcQAAAAECeXDKebJoAbST1T2AbDBui9K0TbSM8sfbhXUAZ2ROAoCRs5cG1pRvY+ityyPWFEKPd7+3qEupavkAZ/+L7/28G";
        BaseEncoding base64Encoding = BaseEncoding.base64();
        byte[] bytes = base64Encoding.decode(txBody);

        TransactionEnvelope transactionEnvelope = TransactionEnvelope.decode(new XdrDataInputStream(new ByteArrayInputStream(bytes)));
        ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();

        transactionEnvelope.encode(new XdrDataOutputStream(byteOutputStream));
        String serialized = base64Encoding.encode(byteOutputStream.toByteArray());
        // fails here
        assertEquals(serialized, txBody);
    }

Any ideas on what we can to deal with this problem?

MonsieurNicolas · 2019-12-06T22:02:32Z

only clients that try to render those strings should try to decode them as utf8. see stellar/stellar-core#2310 on background

leighmcculloch · 2019-12-06T23:41:30Z

The XDR specification actually requires text to be encoded as ASCII but it seems we normalized on UTF-8, and so have others, e.g. the davecgh/go-xdr package we forked.

@tamirms Do you know what in the Go SDK is causing the string to be changed? Go shouldn't make any changes to the bytes when converting a []byte to a string. Even though all strings in Go are UTF-8, the bytes should remain intact through casting in that language at least.

We can fix the Java SDK decoding and re-encoding error by not round-tripping the byte[] through a String. I made a suggestion about that here: lightsail-network/java-stellar-sdk#257 (comment).

Could you post a full example demonstrating the issue? I can't figure out what all the imports are, e.g. protocol. It would be helpful to use https://play.golang.org or https://repl.it.

tomerweller · 2019-12-07T00:29:16Z

So it's pretty clear that we've been accepting arbitrary bytes which means the MEMO_TEXT field meant MEMO_BINARY.

The two viable solutions that I see here are:

Modify our XDR libs to represent Strings as byte-arrays and have special behavior for encoding/decoding un-renderable bytes (non utf8). This will probably have the least effect on consumers but is pretty off-spec for XDR and thus incompatible with third party XDR libs. (//UPDATE: according to dm other xdr implementations also allow arbitrary bytes in string types)
Modify our core XDR definitions to match the current behavior and spec. This means migrating the type from an XDR String type to an XDR Variable Length Opaque type. This is completely backwards compatible on the wire but will break consumers of the generated code.

I prefer the latter and believe the damage can be contained as most consumers use SDK utilities wrt memos, not the generated code (not sure though). SDK utils can be fixed when the xdr code is regenerated to allow for an (almost) smooth backwards compatible experience.

bartekn · 2019-12-09T12:09:46Z

This transaction https://horizon-testnet.stellar.org/transactions/213d52180f5e74fc4e4bc2c86740a44f98a40ab31e6718cc032e95f9084ae535 contains a memo field which is not a valid unicode string.

Horizon actually removes all invalid UTF-8 byte sequences during ingestion because Postgres does not support inserting non-UTF-8 strings in character fields: https://github.com/stellar-deprecated/horizon/blame/master/src/github.com/stellar/horizon/db2/core/transaction.go#L57

The PR changing this is here: stellar-deprecated/horizon#353. I think it was done to support ingestion of txs with invalid UTF-8 strings because Postgres didn't accept it. Obviously not the perfect solution but I think the only alternative was a schema change devs wanted to avoid.

tamirms · 2019-12-09T15:54:10Z

@tomerweller I prefer solution 2. Solution 1 would require us to update https://github.com/stellar/xdrgen to fix how strings are encoded / decoded. Anything requiring us to update https://github.com/stellar/xdrgen is not ideal

ire-and-curses · 2019-12-09T19:42:59Z

I also prefer solution 2. In the Go SDK, XDR string memos are abstracted:

type Memo struct {
	Type    MemoType
	Text    *string `xdrmaxsize:"28"`
	Id      *Uint64
	Hash    *Hash
	RetHash *Hash
}

So I think clients will be insulated from the change.

tomerweller · 2019-12-09T21:48:48Z

Horizon actually removes all invalid UTF-8 byte sequences during ingestion because Postgres does not support inserting non-UTF-8 strings

@bartekn This sounds pretty bad. It means that Memo ingestion is lossy.

ire-and-curses · 2019-12-09T22:07:11Z

@bartekn This sounds pretty bad. It means that Memo ingestion is lossy.

Well, it means that for the (presumably small) subset of non-UTF8 memos, submitting something not technically supported by either the XDR spec or current SDK implementations leads to Horizon displaying them in a mangled way. The original data presumably lives untouched on the blockchain. I don't love it, but I'm also not sure it's very important?

leighmcculloch · 2019-12-09T22:44:03Z

Anything requiring us to update https://github.com/stellar/xdrgen is not ideal

@tamirms Why are we avoiding changing xdrgen? Changing the generated code in the Java SDK means any future time we regenerate the code we lose these changes. RE: lightsail-network/java-stellar-sdk#259 (comment)

tamirms · 2019-12-09T22:47:07Z

@leighmcculloch did you see tomer's comment?

Modify our core XDR definitions to match the current behavior and spec. This means migrating the type from an XDR String type to an XDR Variable Length Opaque type. This is completely backwards compatible on the wire but will break consumers of the generated code.

leighmcculloch · 2019-12-09T22:53:59Z

Just a misunderstanding on my part, resolved here: lightsail-network/java-stellar-sdk#259 (comment).

stanford-scs · 2019-12-10T19:10:23Z

I'm strongly opposed to modifying the XDR to convert all strings to opaque, as this throws away valuable semantic information about the protocol, and will make debugging harder.

While technically RFC4506 says a string is ASCII bytes, I have never seen an implementation that enforced this. Moreover, NFS2 and NFS3 allow non-ASCII characters in filenames represented as strings, so we have a pretty solid precedent here. I have seen implementations that do not work with 0-valued bytes in strings, even though NUL is a valid ASCII character, just because C uses NUL-terminated strings, but this is clearly not correct. I think the current behavior is acceptable, namely to consider a string to be an array of bytes without specifying the particular encoding.

The problems people are experiencing are really with language-specific representations of XDR, and should be fixed in the compiler and/or runtime.

The Java SDK assumes that all memo texts must be valid UTF 8 strings. However, that assumption is not valid. It turns out that any sequence of bytes is allowed to appear in the memo field (see stellar/go#2022 ). Therefore, the correct representation for the memo field is a byte array.

tamirms · 2020-01-17T19:39:40Z

Closing this issue because we opted not to change the type of all XDR string fields to opaque. Instead, the Java XDR generator has been modified to represent XDR strings as byte arrays: stellar/xdrgen#48

tamirms mentioned this issue Dec 9, 2019

Use bytes to represent memo text lightsail-network/java-stellar-sdk#259

Merged

tamirms mentioned this issue Dec 10, 2019

Represent XDR strings as bytes stellar/xdrgen#48

Merged

tamirms closed this as completed Jan 17, 2020

tamirms mentioned this issue Apr 6, 2020

Add memo text base64 field in Horizon transaction response #2458

Closed

tamirms mentioned this issue Apr 14, 2020

protocols/horizon: Add memo_bytes field to transaction response #2485

Merged

7 tasks

overcat mentioned this issue Aug 8, 2021

Need to clarify the description of MEMO_TEXT. stellar/new-docs#555

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XDR strings may not be valid utf8 #2022

XDR strings may not be valid utf8 #2022

tamirms commented Dec 6, 2019

MonsieurNicolas commented Dec 6, 2019

leighmcculloch commented Dec 6, 2019 •

edited

Loading

tomerweller commented Dec 7, 2019 •

edited

Loading

bartekn commented Dec 9, 2019 •

edited

Loading

tamirms commented Dec 9, 2019

ire-and-curses commented Dec 9, 2019

tomerweller commented Dec 9, 2019

ire-and-curses commented Dec 9, 2019 •

edited

Loading

leighmcculloch commented Dec 9, 2019

tamirms commented Dec 9, 2019

leighmcculloch commented Dec 9, 2019 •

edited

Loading

stanford-scs commented Dec 10, 2019

tamirms commented Jan 17, 2020

XDR strings may not be valid utf8 #2022

XDR strings may not be valid utf8 #2022

Comments

tamirms commented Dec 6, 2019

MonsieurNicolas commented Dec 6, 2019

leighmcculloch commented Dec 6, 2019 • edited Loading

tomerweller commented Dec 7, 2019 • edited Loading

bartekn commented Dec 9, 2019 • edited Loading

tamirms commented Dec 9, 2019

ire-and-curses commented Dec 9, 2019

tomerweller commented Dec 9, 2019

ire-and-curses commented Dec 9, 2019 • edited Loading

leighmcculloch commented Dec 9, 2019

tamirms commented Dec 9, 2019

leighmcculloch commented Dec 9, 2019 • edited Loading

stanford-scs commented Dec 10, 2019

tamirms commented Jan 17, 2020

leighmcculloch commented Dec 6, 2019 •

edited

Loading

tomerweller commented Dec 7, 2019 •

edited

Loading

bartekn commented Dec 9, 2019 •

edited

Loading

ire-and-curses commented Dec 9, 2019 •

edited

Loading

leighmcculloch commented Dec 9, 2019 •

edited

Loading