Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVF: include polkadot's version into the artifact path #695

Closed
mrcnski opened this issue Mar 13, 2023 · 6 comments · Fixed by #1828
Closed

PVF: include polkadot's version into the artifact path #695

mrcnski opened this issue Mar 13, 2023 · 6 comments · Fixed by #1828
Assignees

Comments

@mrcnski
Copy link
Contributor

mrcnski commented Mar 13, 2023

ISSUE

Overview

Compilation artifact filenames should contain the polkadot version. This way if there is a node upgrade, an old version can't override an artifact for the new node.

@mrcnski
Copy link
Contributor Author

mrcnski commented Mar 14, 2023

We may also want to include a hash per run; see here:

paritytech/polkadot#6861 (comment)

@mrcnski
Copy link
Contributor Author

mrcnski commented Aug 8, 2023

We may also want to include a hash per run; see here:

#6861 (comment)

Note this is only if we don't want artifacts to persist across restarts. But I think most of the issues with reusing artifacts across restarts would be addressed by this issue and by #677. See #685

@eagr
Copy link
Contributor

eagr commented Oct 8, 2023

I presume no one is working on this right? @s0me0ne-unkn0wn @mrcnski

I dug around the code a bit. On node startup, all existing artifacts will be nuked. Then a host will spin up a worker to compile the PVF and it writes to a tmp file on completion on success. Later, the host will be notified and rename the tmp file to the dest. But the artifact is only executed if it is marked Prepared. Please correct me if I got it wrong.

I could imagine there will be some corner cases, like a host restarting during PVF compilation and purging the caches before compilation is completed. But even that wouldn’t be a problem if the host doesn’t deem the artifact prepared, which doesn’t seem to be the case.

Please help me understand why are these needed. Like in what scenario this could otherwise be a problem? Or is this some kind of precaution?

@s0me0ne-unkn0wn
Copy link
Contributor

@eagr, the initial idea comes from an incident we had some time ago. Worker binaries were not separate then and spawned from the main polkadot binary. There was a validator whose operator upgraded his node in the wrong way. He overwrote the polkadot binary with a new version and didn't restart the node. There was an artifact in cache generated by the old node version. The new node came with a new version of Wasmtime that introduced some breaking changes, and that new version of Wasmtime wasn't able to execute artifacts produced by older versions. So, when it came to the execution, the node spawned an execution worker from the new polkadot binary, Wasmtime refused to execute the old artifact, validation failed, and the validator started a dispute. As the dispute obviously concluded invalid, the validator might be slashed.

That's some pre-history of why we wanted to include the node version into the artifact path initially. The workers have been split out since then, and we have a version control between node and workers, so this measure does not make as much sense for that very problem now. But now there're discussions on preserving artifacts between runs. Preparation is a very resource-heavy process, and in the light of on-demand parachains upcoming, it would be good to cut on resource usage, so if artifacts are reusable, why not reuse them? But we must ensure they're really reusable, that is, produced with a proper node version.

Hope it helps, and maybe @mrcnski might add something, I'm not sure I remember all the background and reasoning behind this issue.

@eagr
Copy link
Contributor

eagr commented Oct 8, 2023

But now there're discussions on preserving artifacts between runs.

With that change, these additions definitely make sense now. Thanks for the elaboration! @s0me0ne-unkn0wn

@mrcnski
Copy link
Contributor Author

mrcnski commented Oct 9, 2023

Good question @eagr and great answer @s0me0ne-unkn0wn! Thanks!

bkchr pushed a commit that referenced this issue Oct 15, 2023
closes #695

Could potentially be helpful to preserving caches when applicable, as
discussed in #685

kusama address: FvpsvV1GQAAbwqX6oyRjemgdKV11QU5bXsMg9xsonD1FLGK
claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
…h#699)

* SNO-324: Enable multiple accounts in outbound channel contract (paritytech#685)

* Track nonces by source address

* Remove principal from basic outbound channel

* Update relayer bindings

* Track nonces by origin address

* Fix outbound channel tests

* SNO-325: Enable multiple accounts in inbound channel pallet (paritytech#687)

* Add origin field to message decoding

* Track nonces by message origin

* Include origin address in basic channel message id

* Update basic inbound fixture data

* Add note about updating inbound channel test data

* Mention message data encoding in comment

* Update test data in envelope test

* Remove channel id from message id

* s/origin/user/g

* Swap the order of source & user

* Reword test data generation section

* s/value/nonce

* Use named fields in MessageId enum

* Switch to Twox64Concat hasher

This prevents accidental expensive lookups while remaining relatively
fast vs Blake2_128 and resistant to attacks that cause prefix
collisions, thanks to the security of keccak used to create the Ethereum
address.

* Remove rogue print statement

* SNO-326: Forward messages by address in relayer (paritytech#695)

* Remove redundant slice

* Add initial address filtering

* Remove extra nonce var

* Clean up basic channel address config

* Rename parachain relayer account config

* Remove unused method

* Fix up comments

* Add default eth addresses for basic channel

* Improve map log readability

* Fix parachain writer error messages

* Bump lodestar version in example command

* Switch to default eth account for E2E script

* Increase default timeout from 6m 40s to 25m

This helps with some of the longer running tests that are timing out
when given 400s.

* Rename mapping to nonce

* Rename origin to account in basic outbound channel

* Remove principal from outbound channel contract

* Rename user to account in basic inbound channel

* Fix event account field rename in test
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
* upgrade substrate to polkadot-v0.9.22

cargo update -p sp-std --precise 616d33ea23bab86cafffaf116fc607b6790fb4eb

* replace jsonrpc-* dependencies by jsonrpsee

* wip

* ref: migrate to jsonrpsee

* migrate template to jsonrpsee

* fix clippy warnings

* fix rust tests

* fix manual seal compilation

* fmt

* Add ethereum style subscription id provider

* ref: not use atomic

* Use ethereum style subscription id provider in manual-seal mode too

* fix manual seal build

* fix: subscription id should be prefixed by 0x

* for some reason, the extrinsic gas limit change

* pin substrate 6c14be4

* fix build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants