Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calls into Oso can block #1584

Open
davepacheco opened this issue Aug 12, 2022 · 1 comment
Open

calls into Oso can block #1584

davepacheco opened this issue Aug 12, 2022 · 1 comment

Comments

@davepacheco
Copy link
Collaborator

davepacheco commented Aug 12, 2022

For background, see this comment. Oso's is_authorized() call can block on locks. We're currently doing this from async code in tokio core threads. Instead, this needs to happen on blocking threads.

@davepacheco
Copy link
Collaborator Author

Via @jgallagher, the tokio Mutex docs say:

Contrary to popular belief, it is ok and often preferred to use the ordinary Mutex from the standard library in asynchronous code.
...
The feature that the async mutex offers over the blocking mutex is the ability to keep it locked across an .await point.
...
Note that, although the compiler will not prevent the std Mutex from holding its guard across .await points in situations where the task is not movable between threads, this virtually never leads to correct concurrent code in practice as it can easily lead to deadlocks.

The shared-state tutorial says:

A synchronous mutex will block the current thread when waiting to acquire the lock. This, in turn, will block other tasks from processing. However, switching to tokio::sync::Mutex usually does not help as the asynchronous mutex uses a synchronous mutex internally
...
As a rule of thumb, using a synchronous mutex from within asynchronous code is fine as long as contention remains low and the lock is not held across calls to .await.

Along those lines, blocking is fine in async code as long as it's not for long. I think the risk is that it winds up being longer than expected (resulting in latency bubbles) or pathologically long (like deadlocks), potentially turning a recoverable problem into a much worse one.

@plotnick mentioned:

Oso shouldn't block for "long" doing authz checks. It needs to write-lock the KB when rules are added, but checks should be mostly non-blocking, AFAIK.

So this is probably not a big practical problem aside from bugs like the deadlock we ran into. But it does seem to expose us to a risk of latency bubbles (if there ever is write contention) and potentially more catastrophic failure when there is a deadlock or the like.

leftwo pushed a commit that referenced this issue Jan 27, 2025
Crucible changes are:
Allow read only activation with less than three downstairs (#1608)
Tweaks to automatic flush (#1613)
Update Rust crate twox-hash to v2 (#1547)
Remove `LastFlushAck` (#1603)
Correctly print 'connecting' state (#1612)
Make live-repair part of invariants checks (#1610)
Simplify mend region selection (#1606)
Generic read test for crutest (#1609)
Always remove skipped jobs from dependencies (#1604)
Add libsqlite3-dev install step to Github Actions CI (#1607)
Move Nexus notification to standalone task (#1584)
DTrace cleanup. (#1602)
Reset completed work Downstairs on a `Barrier` operation (#1601)
Upstairs state machine refactoring (3/3) (#1577)

Propolis changes are:
Wire up initial support for AMD perf counters
build: upgrade tokio to 1.40.0 (#836)
build: explicitly install libsqlite3-dev in CI (#834)
add JSON output format to cpuid-gen (#832)
leftwo added a commit that referenced this issue Jan 27, 2025
Crucible changes are:
Allow read only activation with less than three downstairs (#1608)
Tweaks to automatic flush (#1613)
Update Rust crate twox-hash to v2 (#1547)
Remove `LastFlushAck` (#1603)
Correctly print 'connecting' state (#1612)
Make live-repair part of invariants checks (#1610) Simplify mend region
selection (#1606)
Generic read test for crutest (#1609)
Always remove skipped jobs from dependencies (#1604) Add libsqlite3-dev
install step to Github Actions CI (#1607) Move Nexus notification to
standalone task (#1584) DTrace cleanup. (#1602)
Reset completed work Downstairs on a `Barrier` operation (#1601)
Upstairs state machine refactoring (3/3) (#1577)

Propolis changes are:
Wire up initial support for AMD perf counters
build: upgrade tokio to 1.40.0 (#836)
build: explicitly install libsqlite3-dev in CI (#834) add JSON output
format to cpuid-gen (#832)

---------

Co-authored-by: Alan Hanson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant