Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pass-style): feature flag: only well-formed strings are passable #2002

Merged
merged 2 commits into from
Mar 13, 2024

Conversation

erights
Copy link
Contributor

@erights erights commented Jan 24, 2024

closes: #1739
refs: ocapn/ocapn#47 https://github.com/tc39/proposal-is-usv-string

Description

At an OCapN meeting, we resolved ocapn/ocapn#47 by deciding that only well-formed Unicode strings may be passed. A well-formed Unicode string does not contain any unpaired surrogates. It consists only of a sequence of Unicode code points. A conforming OCapN implementation MUST only emit well-formed Unicode strings, and MUST validate that incoming strings are well-formed, and reject those that are not.

Within the Agoric implementation, the way to implement these restrictions is by having passStyleOf(str) only judge well-formed string to be passable, rejecting all other strings as not passable.

If the underlying engine does not yet implement isWellFormed, we fall back to @gibson042 's shim implementation at ocapn/ocapn#47 (comment) .

Update: Because we do not yet know the performance impact, this PR hides this new feature behind a feature flag

export ONLY_WELL_FORMED_STRINGS_PASSABLE=enabled

that defaults to disabled. Unless enabled, there should be no observable difference for now, i.e., any JavaScript string would still be considered Passable. See this PR's NEWS.md update for more.

Security Considerations

If enabled, this change improves integrity because it reduces the attack surface. By rejecting incoming strings that are not well-formed, a counterparty cannot use such a string to push internal algorithms into cases they may not have tested.

Scaling Considerations

Before this PR or with this feature disabled (currently the default), passStyleOf would judge a string to be Passable based simply on typeof being 'string', which is O(1). With this feature enabled, the check will often be O(n) in the length of the string, depending on whether and how the underlying engine implements isWellFormed. If the underlying engine does not yet implement it, our shim implementation takes O(n). In theory, a builtin implementation might remember that a string is well-formed, enabling an O(1) test, at least after the first time. However, we are not aware of any such engine optimizations.

When the argument to passStyleOf is an object that it judges Passable, passStyleOf memoizes its judgement so it need only make the expensive check once. However, because of the impossibility of having a user-level weak data structure weakly indexed by strings, it is impossible for user-level code to do such memoization for strings. We should measure to see if this is a problem in practice.

Documentation Considerations

https://github.com/Agoric/agoric-sdk/blob/master/docs/env.md should be updated to explain the feature flag and the resulting restrictions on passable strings.

Testing Considerations

The first time this PR ran through CI with code enforcing the restriction, but not yet any code to test the restriction, it is unfortunate that CI came up green. This means that before this PR, when we were accepting non-well-formed strings by design, nothing tested that case, at least in a way that caused a test to break because of the added enforcement.

Compatibility Considerations

Enabling this feature is in theory a compatibility break, in that previously non-well-formed strings were supposed to work. However, aside from possible tests specifically about non-well-formed strings, we do not expect any actual code to break. See @kriskowal 's note below at #2002 (comment)

With the feature disabled, which is currently the default, there should be no compat issue at all.

Upgrade Considerations

With the feature defaulting to disabled, there is not yet an breaking change or upgrade issue.

  • [ ] Includes *BREAKING*: in the commit message with migration instructions for any breaking change.
  • Updates NEWS.md for user-facing changes.

@erights erights self-assigned this Jan 24, 2024
@erights erights marked this pull request as draft January 24, 2024 19:34
@erights erights force-pushed the markm-passable-strings-wellformed branch from 068d8f5 to 6e30e57 Compare January 24, 2024 19:35
@kriskowal
Copy link
Member

Under compatibility considerations, this will cause any existing program that is transmitting unpaired surrogates to break. I do not think we need to treat this as a breaking change, but we should note in NEWS.md that we are hoping that nobody is exercising the broken behavior, so that when someone upgrades and if their application breaks, they have a hint about why.

@kriskowal
Copy link
Member

I think this is a good change but it is marked as draft. Is that intentional? This is in my review inbox. Is this ready for review?

@erights
Copy link
Contributor Author

erights commented Jan 24, 2024

I think this is a good change but it is marked as draft. Is that intentional? This is in my review inbox. Is this ready for review?

It lacks only tests, filling out the PR comment and checklist, and NEWS.md text.

I do not think we need to treat this as a breaking change

Should I remove the "!"?

@erights
Copy link
Contributor Author

erights commented Jan 24, 2024

It also lacks adequate comments in the code

@erights erights force-pushed the markm-passable-strings-wellformed branch 2 times, most recently from 20c2a97 to 301077e Compare January 24, 2024 21:18
@erights erights changed the title fix(pass-style)!: only well-formed strings are passable fix(pass-style): only well-formed strings are passable Jan 24, 2024
@erights
Copy link
Contributor Author

erights commented Jan 24, 2024

"!" removed

@erights erights force-pushed the markm-passable-strings-wellformed branch 2 times, most recently from 47d0bb9 to 3104dc4 Compare January 24, 2024 23:55
@erights erights marked this pull request as ready for review January 24, 2024 23:55
@erights erights requested a review from dckc January 24, 2024 23:56
@erights erights force-pushed the markm-passable-strings-wellformed branch from 3104dc4 to 6f656b4 Compare January 24, 2024 23:57
@erights
Copy link
Contributor Author

erights commented Jan 25, 2024

Under compatibility considerations, this will cause any existing program that is transmitting unpaired surrogates to break. I do not think we need to treat this as a breaking change,

I removed the "!". Done.

but we should note in NEWS.md that we are hoping that nobody is exercising the broken behavior, so that when someone upgrades and if their application breaks, they have a hint about why.

Done.

Given that I removed the "!", just confirming that I should not Includes *BREAKING*: in the commit message, letting NEWS.md do the work. Yes?

Is this ready for review?

yes

It lacks only tests,

Done

filling out the PR comment and checklist,

Done, with the first box unchecked.

and NEWS.md text.

Done

It also lacks adequate comments in the code

Done

@erights erights force-pushed the markm-passable-strings-wellformed branch 2 times, most recently from ff16bf9 to e615774 Compare January 25, 2024 00:08
@dckc
Copy link
Contributor

dckc commented Jan 25, 2024

With this PR, the check will often be O(n) in the length of the string

How often? What's the wall-clock impact on the Agoric blockchain? I hope we don't land this until we have pretty clear data on that.

@dckc
Copy link
Contributor

dckc commented Jan 25, 2024

I do not think we need to treat this as a breaking change ...

I don't understand why not. It's clearly a change that's observable from clients.

@erights
Copy link
Contributor Author

erights commented Jan 25, 2024

I don't understand why not.

@kriskowal , I leave that to you. I'm willing to go back to "!" if you think I should.

@erights
Copy link
Contributor Author

erights commented Jan 25, 2024

How often? What's the wall-clock impact on the Agoric blockchain? I hope we don't land this until we have pretty clear data on that.

While I agree, can someone else take on actually measuring this?

Copy link
Contributor

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't predict impact on the Agoric blockchain, but the effects on toCapData (which includes passStyleOf on the full input data structure) in isolation seem to be negligible in V8 and significant in XS (but only for long strings). Measuring the time to smallcaps-encode harden({ strings: ["", "foo", "A little bit longer now...", "A".repeat(1000), "𝌆".repeat(500)] }), an array of 10 copies thereof, and a clone without the strings of length > 100:

Before

#### Moddable XS
encode: 10.00 ops/ms
encode10: 1.23 ops/ms
encodeShortStrings: 8.70 ops/ms

#### V8
encode: 28.57 ops/ms
encode10: 5.71 ops/ms
encodeShortStrings: 25.00 ops/ms

After

#### Moddable XS
encode: 0.98 ops/ms
encode10: 0.18 ops/ms
encodeShortStrings: 9.52 ops/ms

#### V8
encode: 20.00 ops/ms
encode10: 5.88 ops/ms
encodeShortStrings: 40.00 ops/ms

I'm in favor of this change... the hit in a narrow operation is noticeable only for big input, and is likely to be replaced by native code in the relatively near future.

@erights erights force-pushed the markm-passable-strings-wellformed branch from e615774 to 856b6af Compare January 25, 2024 20:41
@erights erights requested a review from ivanlei January 25, 2024 21:08
@erights
Copy link
Contributor Author

erights commented Jan 25, 2024

@gibson042 , thanks for measuring that!

@erights erights force-pushed the markm-passable-strings-wellformed branch 2 times, most recently from 25a07a1 to b0051b9 Compare January 26, 2024 02:08
@erights erights changed the title fix(pass-style): only well-formed strings are passable fix(pass-style)!: only well-formed strings are passable Jan 26, 2024
@erights
Copy link
Contributor Author

erights commented Jan 26, 2024

At #2008 (review) we decided to add the "!" back in to this PR.

Done.

Now I need to add an appropriate *BREAKING*: to the commit message.

@dckc
Copy link
Contributor

dckc commented Jan 26, 2024

I'm interested to learn whether the breaking change judgement is more of a math problem (does there exist a test by which I could observe it?) or a cost-benefit / policy sort of thing.

This is a tree falls in the forest situation. ... I’ve asked around for whether it would actually be observed. If we can guarantee not ...

A guarantee would involve a closed-world assumption, yes? i.e. an assumption that we can somehow find all the clients. Is our working model that we can, in fact do that?

Thought experiment: have clients send a sort of "warrantee registration card" - if you let us know that you depend on our stuff, we'll make every effort to consider your stuff when judging breaking changes etc.

@dckc
Copy link
Contributor

dckc commented Jan 26, 2024

likely to be replaced by native code in the relatively near future.

I'd like to see 2 issues:

  1. that tracks integration of s.isWellFormed() in xsnap (i.e. upgrading XS version, adding a test, ...)
  2. that tracks shipping of that version of xsnap on chain, with a dependency so that it's scheduled before metering

Maybe those (or issues that subsume them) exist already?

@erights erights force-pushed the markm-passable-strings-wellformed branch from b0051b9 to ffec363 Compare January 29, 2024 06:37
@erights erights force-pushed the markm-passable-strings-wellformed branch 3 times, most recently from 73c10e6 to 241dfc0 Compare February 9, 2024 22:38
@erights erights force-pushed the markm-passable-strings-wellformed branch 3 times, most recently from ce6f952 to a7a8d37 Compare February 20, 2024 01:36
@dckc dckc removed their request for review February 20, 2024 18:17
@erights erights force-pushed the markm-passable-strings-wellformed branch from a7a8d37 to 03f60e8 Compare February 24, 2024 04:06
@erights erights force-pushed the markm-passable-strings-wellformed branch from 03f60e8 to 953cd6d Compare March 13, 2024 00:13
@erights erights force-pushed the markm-passable-strings-wellformed branch from 953cd6d to d731726 Compare March 13, 2024 20:00
@erights erights changed the title fix(pass-style)!: only well-formed strings are passable feat(pass-style): feature flag: only well-formed strings are passable Mar 13, 2024
@erights erights requested review from kriskowal and gibson042 March 13, 2024 20:12
@erights
Copy link
Contributor Author

erights commented Mar 13, 2024

@kriskowal @gibson042 , I hid this new feature behind a feature flag, defaulting to disabled, so we could include it in the upcoming endo release, to start experimenting with in from agoric-sdk. Even though you already approved, it is a big enough change that I'd appreciate it if you could PTAL before I merge. Thanks!

@erights erights merged commit bca1e3f into master Mar 13, 2024
14 checks passed
@erights erights deleted the markm-passable-strings-wellformed branch March 13, 2024 22:29
mergify bot pushed a commit to Agoric/agoric-sdk that referenced this pull request Apr 15, 2024
closes: #XXXX
refs: endojs/endo#2002
endojs/endo#1860

## Description

Now that agoric-sdk depends on a version of endo that switches on these
environment variables, it is time for `env.md`, where we gather all such
explanations, to explain these additional env vars.

### Security Considerations
none, since this PR only documents the existing situation.

For the environment variable sensitive behavior itself, I think none as
well. We should examine more closely in general whether env var
sensitivity in our code opens up any opportunity for attackers. We think
not though, because attackers should only ever execute in environments
where these are not set to non-defensive settings. And attackers should
never be in a position to set these.

### Scaling Considerations

none. But the new text does explain the existing scaling consideration
around the `ONLY_WELL_FORMED_STRINGS_PASSABLE` environment variable.

### Documentation Considerations

the point. Since we gather all these explanations in agoric-sdk's env.md
file, there is an inter-repo coordination problem after introducing new
env vars into endo.

### Testing Considerations
none
### Upgrade Considerations
none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants