Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(adr): Textual switch to UTF-8 #13163

Closed
wants to merge 7 commits into from
Closed

docs(adr): Textual switch to UTF-8 #13163

wants to merge 7 commits into from

Conversation

amaury1093
Copy link
Contributor

Description

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@@ -51,11 +51,71 @@ This also prevents users signing over any hashed transaction data (fee, transact

We propose to maintain functional tests using bijectivity in the SDK.

### 2. Only ASCII 32-127 characters allowed
### 2. UTF-8 characters allowed, but signing devices MAY convert them before display

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a relevant thread on why we chose ASCII in the first place: #10701 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should explain this in a paragraph and not directly in a markdown header?

Ledger devices have limited character display capabilities, so all strings MUST only contain ASCII characters in the 32-127 range.
The SIGN_MODE_TEXTUAL specification allows all UTF-8 characters. The textual strings will contain all characters as-is, with the modifications below:
- the line feed `\n` character (ASCII: 10) is escaped using quotation marks: `"\n"`. This is to disambiguate with the `\n` control character used to signal a screen change on the signing device.
- the quotation mark character `"` (ASCII: 34) is escaped with a backslash prefix: `\"`. This is to allow bijectivity if the signing device decides to convert UTF-8 characters into its own set of displayable characters, using `"` as a control character.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually the backslash character \, not quotation marks, that needs to be quoted in order to legibly and bijectively quote newlines. As written, this allows all other ASCII control characters, e.g. nul, bel, etc, to transmit as themselves despite having no printable representation, even on Unicode-enabled devices.

I strongly recommend following an established standard quotation algorithm rather than trying to invent something new. There are a large number of existing designs that follow the basic pattern of backslash followed by:

  • b: backspace
  • f: form feed
  • n: newline
  • r: carriage return
  • t: horizontal tab
  • v: vertical tab
  • 0: nul
  • \: backslash
  • some means of quoting other control characters by number

These transformations ought to be part of the textual spec, rather than relegated to the Ledger, because they're necessary for bijectivity and accurate legible quoting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok. I completely misunderstood you during our monday call, when you were mentioning "quotation" i understood quotation marks, hence this PR. A lot of this needs to be rewritten then.

Memo: foo

// JSON: {"memo": "\"foo\""}
Memo: \"foo\"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't use quotation marks as metacharacters, we could also just say the following without risk of ambiguity:

Memo: "foo"

Memo: \"foo\"

// JSON: {"memo": "foo\nbar"}
Memo: foo"\n"bar // Where \n is the single line-feed character
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or foo\nbar.

@amaury1093
Copy link
Contributor Author

closing in favor of #13434

@amaury1093 amaury1093 closed this Oct 11, 2022
@amaury1093 amaury1093 deleted the am/textual-utf8 branch October 11, 2022 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants