-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Conversation
Co-authored-by: Andronik <[email protected]>
node/core/pvf/src/executor_intf.rs
Outdated
// The simplest way to fix that is to spawn a new thread with the desired stack limit. One may | ||
// think that it may be too expensive, but I disagree. The cost of creating a new thread is | ||
// measured in microseconds, negligible compared to other costs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be made essentially free if we'd just keep the threads around instead of spawning them from scratch every time, e.g. by constructing a rayon::ThreadPool
and using its scope
to spawn the task (so the code wouldn't even have to change significantly). But as you said it might not matter considering other costs.
Also, it's good to remember that spawning a new thread has also other indirect costs besides the cost of just creating a new thread, e.g. assuming that full 256MB of stack space could be used from within WASM it'd mean that (since we're spawning a new thread from scratch every time) those 256MB of stack space would have to be mmap
'd (and zero`d) from scratch every time; there are also a few other cases where it affects things indirectly (e.g. by triggering process-wide kernel locks whose effect you can only see if there's something else running in the process concurrently). So a general assertion of "creating new threads is cheap" might be somewhat misleading depending on the situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will answer here @eskimor , @ordian to kill two birds with one stone, and also will provide more context for @koute :
How I see it work: a stack would be allocated with mmap
, yes. However, it will allocate only a mapping. The pages will be not backed by anything until they are first accessed. They are just mappings at that point and will stay that until the first access. The zeroing will also happen then. So, it's not precisely "from scratch every time". Moreover, since we are talking about Linux, glibc has a caching layer that recycles stacks of the same size, AFAIK. Between recycles, I expect madvise(DONTNEED)
to be called. Thus, the memory will be marked as free and can be reused as needed and the zeroing will happen after the first access.
However, we can assume that in the worst case all of the stack will be dirty.
In the "spawn-and-forget" approach, this will release the memory between the executions.
In the thread pool approach, the stacks are not returned and not cleared. That means that if at least one execution dirtied 256 MiB of stack then it will be reserved for the rest of the lifetime of that thread. That also implies the stack won't have to be zeroed which improves performance and security-wise it's OK. Besides that, to my knowledge, stacks are mmap
ed without any special flags like MAP_LOCKED
which would allow it to be swapped out. Thanks to that, if there was memory pressure, I would expect the unused parts will gradually be swapped out. On the other hand, accessing those pages leads to restoring from swap and that can be a hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, the ideal approach would be to use a thread pool and then madvise(DONT_NEED)
the upper portion of the stack. That allows us to not bother about the stack clearing during the execution of benign PVFs1, but in the degenerate cases, we will only have to zero pages instead of swapping them in and out.
Footnotes
-
Remember, we are reserving so much stack space only to account for the cases where a malicious PVF managed to overstep our wasm deterministic stack instrumentation. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that this limit should essentially be never hit in normal circumstances, and even if it's hit with the default limit (from what you've said in the other comment) of only 2 threads it'd be only ~512MB in the worst case. I think that's totally fine as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to see some numbers whether this has any performance impact, but otherwise looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sensible. Burnin on Versi would be good before merge. (Just to make sure no crazy amounts of physical memory are acquired or stuff like that)
@tdimitrov can you take care of burning this in on a validator group on Versi please? |
FWIW, just pushed a change with a thread pool as requested by @koute. |
Sure! @pepyakin anything specific to look for in the logs/metrics? (besides the memory usage of course) |
Can't think of anything else besides that. |
Yeah, if nodes don't crash and memory usage does not spike - everything should be fine. |
Thanks @tdimitrov ! |
* master: zombienet: try to fix parachains upgrade test (#5724) Update dependencies (companion for substrate#11722) (#5731) Update metric name and doc (#5716) Bump reqwest from 0.11.10 to 0.11.11 (#5732) add release-engineering to CI files' reviewers (#5733) Bump parity-scale-codec from 3.1.2 to 3.1.5 (#5720) Add checklist item (#5715) Fix 5560: add support for a new `staking-miner info` command (#5577) Bump `wasmtime` to 0.38.0 and `zstd` to 0.11.2 (companion for substrate#11720) (#5707) pvf: ensure enough stack space (#5712) Bump generic-array from 0.12.3 to 0.12.4 in /bridges/fuzz/storage-proof (#5648) pvf: unignore `terminates_on_timeout` test (#5722) Bump proc-macro2 from 1.0.39 to 1.0.40 (#5719) pass $COMPANION_OVERRIDES to check_dependent_project (#5708) Bump thread_local from 1.1.0 to 1.1.4 in /bridges/fuzz/storage-proof (#5687) Bump quote from 1.0.18 to 1.0.19 (#5700) Rococo: add new pallet-beefy-mmr API (companion for substrate#11406) (#5516) Update metric before bailing out (#5706) Add publish docker staking-miner (#5710)
* pvf: ensure enough stack space * fix typos Co-authored-by: Andronik <[email protected]> * Use rayon to cache the thread Co-authored-by: Andronik <[email protected]>
This PR fixes a potential stack overflow.
Exploiting this is not trivial. We impose a limit of 65536 logical items on the wasm value stack. If we assume that each value was spilled on the stack (which is not trivial to achieve), then that would be only 512 KiB1. In case this was exploited the process which hosts the execution worker will be aborted and the PVF host anticipates that2.
Because of not trivial exploitation I've not included the test. In order to test locally, I had to lower the limits to actually trigger this condition.
Footnotes
which makes me think that this number could be bumped. ↩
although it can still be problematic if core dumps are enabled. ↩