Acceptance test for new Moddable XS versions #6929

mhofman · 2023-02-06T20:26:34Z

What is the Problem Being Solved?

With XS heap snapshots and transcripts in consensus (a requirement for state-sync), we need to make sure we don't regress on non divergent execution when updating XS versions.

Furthermore we now believe that our vat transcripts are agnostic of the version of XS used, or its allocation behavior, as long as the implemented JS spec didn't change its behavior. We may be able to use this property to enable transparent XS version upgrades.

Finally we need to ensure that a variety of environment and even CPU architectures produce consistent executions and snapshots (within the limits of know incompatibilities such as endianness for example).

Description of the Design

High level approach

While there is no formal way of verifying that XS is deterministic, especially when reload is involved, we can use empirical tests to help us gain confidence that XS stays consistent.

To confirm compatibility of XS engines, we have 2 main approaches, which should both be part of the acceptance criteria:

Existing unit / integration tests succeed
- This includes running these test under the sanitizer (ASAN, MSAN, UBSAN) versions of xsnap-worker, and no new warnings being trigger compared to the outgoing version
Replaying a large corpus of known execution.
- This is effectively replaying vat transcripts (the deliveries/syscalls captured at the boundary between the kernel and the vat) while keeping some old vat workers after snapshots and comparing the reload vs no-reload behavior and hashes of generated snapshots.
  - The snapshots only need to be consistent within the replay, as a difference from the original snapshot hash is expected in this case
- To detect divergences between environments / architectures, this requires running this test in the different environments, and comparing the generated snapshot hashes.

Generating the known good execution corpus

Our current corpus of execution is derived from the mainnet pismo release. Because of #6784, and the lack of its mitigation in the pismo release, we cannot simply use the vat transcripts generated by a pismo follower node. Instead we use the vat transcripts that were generated using "chain transcripts" as follow:

A follower node running the mhofman/pismo-with-replay-tools-improvements branch, with the "chain transcript" feature enabled (see feat(cosmic-swingset): chain-level transcripts #6931)
```
CHAIN_TRANSCRIPT=$HOME/.agoric/data/chain-transcript.log SLOGSENDER_AGENT=process SLOGFILE=$HOME/.agoric/data/chain-$(date +%s).slog agd start
```

Replay the chain transcript with a patched version of the pismo XS which hides organic GC from liveslots. Use branch mhofman/6784-hide-organic-gc-pismo to run the following:

cat $HOME/.agoric/data/chain-transcript.log | SLOGFILE=ag-cosmos-chain-state/chain-$(date +%s).slog SLOGSENDER_AGENT=process CHAIN_BOOTSTRAP_VAT_CONFIG=/path/to/agoric-sdk/packages/vats/decentral-main-psm-config.json DEBUG=agoric,SwingSet /path/to/agoric-sdk/packages/cosmic-swingset/scripts/replay-chain.js -

Extract lockdown-bundle, supervisor-bundle, and vat transcripts from the chain replay's Swingstore using packages/Swingset/misc-tools/extract-transcript-from-kerneldb.js
- It may be necessary to use packages/SwingSet/misc-tools/extract-transcript-from-slogfile.js for any vat that has exited (missing transcript)

These have already been generated up to block 8941748 (March 6th 2023) and are available on the benchmark machine.

Replay with newer XS version

Using the mhofman/pismo-with-replay-tools-improvements branch as a base (or whatever branch generated the transcripts), update xsnap (both moddable and xsnap-pub to the target version).
- Newer versions of xsnap-worker include the same organic GC patch, and will also hide allocation differences from liveslots, theoretically making them compatible.
Use packages/SwingSet/misc-tools/replay-transcript.js to replay each vat transcript, forcing a reload when snapshots are taken
- This will execute the vat transcript, reloading at every snapshot being taken into a new worker, keeping existing workers around
- The default config may be too demanding on resources for some vats with a lot of deliveries, and keeping slight less workers may help alleviate (e.g. --keep-worker-recent=20 --keep-worker-interval=20)
- This step can be somewhat parallelized on multiple machines (it's very resource intensive)
- If this step fails, the new XS version has some significant difference in JS execution behavior.
  - One know difference may be the introduction of new well-known symbols for which old SES does not handle correctly. It may be possible to use a patched lockdown and generate a new lockdown-bundle that incorporates feat(ses): tame Symbol so whitelist works endojs/endo#1579 to mitigate this issue.
Verify that no anachrophobia or snapshot hashes differences occurred.
- E.g. using grep -A4 anachrophobia replay-v*.log; grep -B2 hashes replay-v*.log returning no matches.
Repeat the replay on other environments and compare the snapshot hashes (possibly using the snapshot-activity.jsonl generated by the replay tool).

Tasks

Give feedback

Update vat transcript extract and replay tools to post bulldozer #8494

SwingSet enhancement xsnap
Options

The text was updated successfully, but these errors were encountered:

raphdev · 2023-03-13T22:23:03Z

Existing unit / integration tests succeed
This includes running these test under the sanitizer (ASAN, MSAN, UBSAN) versions of xsnap-worker, and no new warnings being trigger compared to the outgoing version

For posterity the steps to do this are documented in agoric-labs/xsnap-pub#35

warner · 2023-04-24T22:35:45Z

@ivanlei and I think this can be pushed to vaults+1, because PR #7083 is ready to land, and this test was run on the new version of XS at least once. We want the test to be landed soon, so other people can run it, but the vaults release doesn't block on that landing.

mhofman · 2023-04-24T22:37:30Z

I'm in the process of updating this issue with new manual testing instructions based on the latest Moddable SDK changes. Making that test automated would be a nice to have, but not strictly necessary.

mhofman · 2023-05-26T20:36:05Z

I have updated the acceptance test instructions to reflect our new (still manual) methodology, and to suggest multi-environment testing that would hopefully help detect issues like #7829 and #7841

mhofman · 2023-05-30T21:11:31Z

New issue that would have been caught had we run under multiple environments: #7864

mhofman · 2023-10-03T04:14:59Z

One thing to keep in mind is that this methodology may not work forever, and some XS changes may introduce observable execution difference when they pick up some spec changes. While JavaScript attempts to not break the web, not every change is fully backwards compatible. For example something that used to throw may no longer throw, or some property accesses may change order or be deduplicated (e.g. Moddable-OpenSource/moddable#1223 (comment)).

While such changes are unlikely to be observed by most programs, that remains a possibility which would require some sort of mitigation (possibly changes to the engine to restore the former behavior if needed)

toliaqat · 2024-09-23T19:01:42Z

@mhofman and @siarhei-agoric will meet and break down this bigger issue into smaller chunks.

mhofman added enhancement New feature or request xsnap the XS execution tool labels Feb 6, 2023

This was referenced Feb 6, 2023

feat(xsnap)!: Update Moddable SDK and xsnap-native #6920

Merged

feat(cosmic-swingset): chain-level transcripts #6931

Draft

ivanlei added this to the Vaults EVP milestone Feb 21, 2023

ivanlei assigned mhofman Feb 21, 2023

ivanlei added the vaults_triage DO NOT USE label Feb 21, 2023

mhofman mentioned this issue Feb 27, 2023

Upgrade XS to 3.8 #7083

Closed

mhofman mentioned this issue Apr 24, 2023

feat(xsnap)!: Update to Moddable SDK 3.8.7 #7498

Merged

warner removed this from the Vaults EVP milestone Apr 24, 2023

ivanlei added the v1_triaged DO NOT USE label May 5, 2023

This was referenced May 26, 2023

Non traumatic major XS upgrades #7855

Open

XS 3.8.7 has bad arithmetic on BigInt when __has_builtin(__builtin_add_overflow) #7864

Closed

mhofman mentioned this issue May 30, 2023

fix(xsnap): update Moddable SDK to fix BigInt arithmetic #7866

Merged

This was referenced Nov 1, 2023

Cherry-pick xsnap fixes for next chain upgrade #8487

Open

Update vat transcript extract and replay tools to post bulldozer #8494

Open

mhofman mentioned this issue Sep 5, 2024

swing-store should keep historical artifacts outside of SQLite DB #9389

Open

toliaqat assigned siarhei-agoric Sep 23, 2024

mhofman mentioned this issue Sep 23, 2024

Integrate XS native Compartment support #9451

Open

mhofman mentioned this issue Nov 8, 2024

cosmic-swingset chain transcript and replay #10431

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acceptance test for new Moddable XS versions #6929

Acceptance test for new Moddable XS versions #6929

mhofman commented Feb 6, 2023 •

edited

Loading

Tasks

raphdev commented Mar 13, 2023

warner commented Apr 24, 2023

mhofman commented Apr 24, 2023 •

edited

Loading

mhofman commented May 26, 2023 •

edited

Loading

mhofman commented May 30, 2023

mhofman commented Oct 3, 2023

toliaqat commented Sep 23, 2024

Acceptance test for new Moddable XS versions #6929

Acceptance test for new Moddable XS versions #6929

Comments

mhofman commented Feb 6, 2023 • edited Loading

What is the Problem Being Solved?

Description of the Design

High level approach

Generating the known good execution corpus

Replay with newer XS version

Tasks

raphdev commented Mar 13, 2023

warner commented Apr 24, 2023

mhofman commented Apr 24, 2023 • edited Loading

mhofman commented May 26, 2023 • edited Loading

mhofman commented May 30, 2023

mhofman commented Oct 3, 2023

toliaqat commented Sep 23, 2024

mhofman commented Feb 6, 2023 •

edited

Loading

mhofman commented Apr 24, 2023 •

edited

Loading

mhofman commented May 26, 2023 •

edited

Loading