pvf: ensure enough stack space #5712

pepyakin · 2022-06-21T18:18:43Z

This PR fixes a potential stack overflow.

Exploiting this is not trivial. We impose a limit of 65536 logical items on the wasm value stack. If we assume that each value was spilled on the stack (which is not trivial to achieve), then that would be only 512 KiB¹. In case this was exploited the process which hosts the execution worker will be aborted and the PVF host anticipates that².

Because of not trivial exploitation I've not included the test. In order to test locally, I had to lower the limits to actually trigger this condition.

which makes me think that this number could be bumped. ↩
although it can still be problematic if core dumps are enabled. ↩

node/core/pvf/Cargo.toml

node/core/pvf/src/error.rs

node/core/pvf/src/executor_intf.rs

Co-authored-by: Andronik <[email protected]>

koute · 2022-06-22T06:28:44Z

node/core/pvf/src/executor_intf.rs

+	// The simplest way to fix that is to spawn a new thread with the desired stack limit. One may
+	// think that it may be too expensive, but I disagree. The cost of creating a new thread is
+	// measured in microseconds, negligible compared to other costs.


This could be made essentially free if we'd just keep the threads around instead of spawning them from scratch every time, e.g. by constructing a rayon::ThreadPool and using its scope to spawn the task (so the code wouldn't even have to change significantly). But as you said it might not matter considering other costs.

Also, it's good to remember that spawning a new thread has also other indirect costs besides the cost of just creating a new thread, e.g. assuming that full 256MB of stack space could be used from within WASM it'd mean that (since we're spawning a new thread from scratch every time) those 256MB of stack space would have to be mmap'd (and zero`d) from scratch every time; there are also a few other cases where it affects things indirectly (e.g. by triggering process-wide kernel locks whose effect you can only see if there's something else running in the process concurrently). So a general assertion of "creating new threads is cheap" might be somewhat misleading depending on the situation.

I will answer here @eskimor , @ordian to kill two birds with one stone, and also will provide more context for @koute :

How I see it work: a stack would be allocated with mmap, yes. However, it will allocate only a mapping. The pages will be not backed by anything until they are first accessed. They are just mappings at that point and will stay that until the first access. The zeroing will also happen then. So, it's not precisely "from scratch every time". Moreover, since we are talking about Linux, glibc has a caching layer that recycles stacks of the same size, AFAIK. Between recycles, I expect madvise(DONTNEED) to be called. Thus, the memory will be marked as free and can be reused as needed and the zeroing will happen after the first access.

However, we can assume that in the worst case all of the stack will be dirty.

In the "spawn-and-forget" approach, this will release the memory between the executions.

In the thread pool approach, the stacks are not returned and not cleared. That means that if at least one execution dirtied 256 MiB of stack then it will be reserved for the rest of the lifetime of that thread. That also implies the stack won't have to be zeroed which improves performance and security-wise it's OK. Besides that, to my knowledge, stacks are mmaped without any special flags like MAP_LOCKED which would allow it to be swapped out. Thanks to that, if there was memory pressure, I would expect the unused parts will gradually be swapped out. On the other hand, accessing those pages leads to restoring from swap and that can be a hit.

Perhaps, the ideal approach would be to use a thread pool and then madvise(DONT_NEED) the upper portion of the stack. That allows us to not bother about the stack clearing during the execution of benign PVFs¹, but in the degenerate cases, we will only have to zero pages instead of swapping them in and out.

Footnotes

Remember, we are reserving so much stack space only to account for the cases where a malicious PVF managed to overstep our wasm deterministic stack instrumentation. ↩

Considering that this limit should essentially be never hit in normal circumstances, and even if it's hit with the default limit (from what you've said in the other comment) of only 2 threads it'd be only ~512MB in the worst case. I think that's totally fine as-is.

koute

I'd really like to see some numbers whether this has any performance impact, but otherwise looks good to me.

eskimor

Looks sensible. Burnin on Versi would be good before merge. (Just to make sure no crazy amounts of physical memory are acquired or stuff like that)

eskimor · 2022-06-22T14:38:44Z

@tdimitrov can you take care of burning this in on a validator group on Versi please?

pepyakin · 2022-06-22T14:40:31Z

FWIW, just pushed a change with a thread pool as requested by @koute.

tdimitrov · 2022-06-23T07:34:35Z

@tdimitrov can you take care of burning this in on a validator group on Versi please?

Sure! @pepyakin anything specific to look for in the logs/metrics? (besides the memory usage of course)

pepyakin · 2022-06-23T10:51:49Z

Can't think of anything else besides that.

eskimor · 2022-06-23T11:33:55Z

Yeah, if nodes don't crash and memory usage does not spike - everything should be fine.

tdimitrov · 2022-06-24T10:02:52Z

I don't see anything suspicious in terms of crashes/memory usage:

eskimor · 2022-06-24T11:17:01Z

Thanks @tdimitrov !

* master: Bump `wasmtime` to 0.38.0 and `zstd` to 0.11.2 (companion for substrate#11720) (#5707) pvf: ensure enough stack space (#5712) Bump generic-array from 0.12.3 to 0.12.4 in /bridges/fuzz/storage-proof (#5648) pvf: unignore `terminates_on_timeout` test (#5722)

* master: zombienet: try to fix parachains upgrade test (#5724) Update dependencies (companion for substrate#11722) (#5731) Update metric name and doc (#5716) Bump reqwest from 0.11.10 to 0.11.11 (#5732) add release-engineering to CI files' reviewers (#5733) Bump parity-scale-codec from 3.1.2 to 3.1.5 (#5720) Add checklist item (#5715) Fix 5560: add support for a new `staking-miner info` command (#5577) Bump `wasmtime` to 0.38.0 and `zstd` to 0.11.2 (companion for substrate#11720) (#5707) pvf: ensure enough stack space (#5712) Bump generic-array from 0.12.3 to 0.12.4 in /bridges/fuzz/storage-proof (#5648) pvf: unignore `terminates_on_timeout` test (#5722) Bump proc-macro2 from 1.0.39 to 1.0.40 (#5719) pass $COMPANION_OVERRIDES to check_dependent_project (#5708) Bump thread_local from 1.1.0 to 1.1.4 in /bridges/fuzz/storage-proof (#5687) Bump quote from 1.0.18 to 1.0.19 (#5700) Rococo: add new pallet-beefy-mmr API (companion for substrate#11406) (#5516) Update metric before bailing out (#5706) Add publish docker staking-miner (#5710)

* pvf: ensure enough stack space * fix typos Co-authored-by: Andronik <[email protected]> * Use rayon to cache the thread Co-authored-by: Andronik <[email protected]>

pvf: ensure enough stack space

357bb5f

pepyakin requested review from koute and eskimor June 21, 2022 18:18

pepyakin commented Jun 21, 2022

View reviewed changes

node/core/pvf/Cargo.toml Outdated Show resolved Hide resolved

ordian reviewed Jun 21, 2022

View reviewed changes

node/core/pvf/Cargo.toml Outdated Show resolved Hide resolved

node/core/pvf/src/error.rs Outdated Show resolved Hide resolved

node/core/pvf/src/executor_intf.rs Outdated Show resolved Hide resolved

fix typos

c2d4515

Co-authored-by: Andronik <[email protected]>

koute reviewed Jun 22, 2022

View reviewed changes

koute approved these changes Jun 22, 2022

View reviewed changes

pepyakin mentioned this pull request Jun 22, 2022

Merge PVF execution into Substrate repo? paritytech/polkadot-sdk#2344

Open

Use rayon to cache the thread

894fdc5

eskimor approved these changes Jun 22, 2022

View reviewed changes

eskimor merged commit 88e75c6 into master Jun 24, 2022

eskimor deleted the pep-pvf-stk-ovf branch June 24, 2022 11:16

al3mart pushed a commit that referenced this pull request Jul 14, 2022

pvf: ensure enough stack space (#5712)

ddc184e

* pvf: ensure enough stack space * fix typos Co-authored-by: Andronik <[email protected]> * Use rayon to cache the thread Co-authored-by: Andronik <[email protected]>

ghzlatarev mentioned this pull request Jul 29, 2022

Update deps from v0.9.22 to v0.9.26 Manta-Network/Manta#720

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pvf: ensure enough stack space #5712

pvf: ensure enough stack space #5712

pepyakin commented Jun 21, 2022

koute Jun 22, 2022 •

edited

Loading

pepyakin Jun 22, 2022

pepyakin Jun 22, 2022

koute Jun 24, 2022

koute left a comment

eskimor left a comment

eskimor commented Jun 22, 2022

pepyakin commented Jun 22, 2022

tdimitrov commented Jun 23, 2022 •

edited

Loading

pepyakin commented Jun 23, 2022

eskimor commented Jun 23, 2022

tdimitrov commented Jun 24, 2022

eskimor commented Jun 24, 2022

pvf: ensure enough stack space #5712

pvf: ensure enough stack space #5712

Conversation

pepyakin commented Jun 21, 2022

Footnotes

koute Jun 22, 2022 • edited Loading

Choose a reason for hiding this comment

pepyakin Jun 22, 2022

Choose a reason for hiding this comment

pepyakin Jun 22, 2022

Choose a reason for hiding this comment

Footnotes

koute Jun 24, 2022

Choose a reason for hiding this comment

koute left a comment

Choose a reason for hiding this comment

eskimor left a comment

Choose a reason for hiding this comment

eskimor commented Jun 22, 2022

pepyakin commented Jun 22, 2022

tdimitrov commented Jun 23, 2022 • edited Loading

pepyakin commented Jun 23, 2022

eskimor commented Jun 23, 2022

tdimitrov commented Jun 24, 2022

eskimor commented Jun 24, 2022

koute Jun 22, 2022 •

edited

Loading

tdimitrov commented Jun 23, 2022 •

edited

Loading