-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor device auth schema #1344
Conversation
Requests are now short-lived so that we can use `user_code` as a primary key. Authz is implemented and working for all but access token lookup, which needs to be looked up both by `token` (primary key) and the unique combination of `client_id` and `device_code`.
access_token: DeviceAccessToken, | ||
) -> CreateResult<DeviceAccessToken> { | ||
assert_eq!(authz_user.id(), access_token.silo_user_id); | ||
opctx.authorize(authz::Action::CreateChild, authz_user).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check feels a little weird, since DeviceAccessToken
isn't declared by lookup_resource!
as a child of SiloUser
. Suggestions for a better check would be welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could create another synthetic resource analogous to ConsoleSessionList
(DeviceAccessTokenList
) but I admit the boilerplate-to-value ratio on those is getting worse...
To be clear, would we expect this function to be invoked by the user themselves or by Nexus's external-authenticator? I assumed the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this function is called as part of the confirmation in the browser, so it's made by an authenticated user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite some questions below I really like the changes here!
); | ||
|
||
-- Matches the primary key on device authorization records. | ||
-- The UNIQUE constraint is critical for ensuring that at most | ||
-- This UNIQUE constraint is critical for ensuring that at most |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is no longer true (that is, removal from the device_auth_request
guarantees that now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it is still true with respect to this table considered alone. There shouldn't be any code paths now for which this invariant is broken, but it still seems like a good idea to leave it in place for future us (me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I wasn't suggesting removing the constraint, just potentially updating the comment, but it's fine as is too)
access_token: DeviceAccessToken, | ||
) -> CreateResult<DeviceAccessToken> { | ||
assert_eq!(authz_user.id(), access_token.silo_user_id); | ||
opctx.authorize(authz::Action::CreateChild, authz_user).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could create another synthetic resource analogous to ConsoleSessionList
(DeviceAccessTokenList
) but I admit the boilerplate-to-value ratio on those is getting worse...
To be clear, would we expect this function to be invoked by the user themselves or by Nexus's external-authenticator? I assumed the latter.
// TODO-security: authz | ||
pub async fn device_auth_get_token( | ||
/// Look up a granted device access token. | ||
/// Note: since this lookup is not by a primary key or name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the client allowed to fetch the token more than once?
Wondering out loud if there should be another table of short-lived records here whose primary key is (client_id, device_id), which just has a foreign key into the token table. If they're only ever allowed to fetch the token once, that might also be useful to enforce that. If this doesn't seem appealing, ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good question. I don't think there's an explicit restriction on fetching the token more than once, though I do think it would be odd. I'm not very keen to add a new table, though.
nexus/src/db/datastore.rs
Outdated
self.pool_authorized(opctx) | ||
.await? | ||
.transaction(move |conn| { | ||
delete_request.execute(conn)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there's a window here where this could happen:
- request A comes in to verify the request and create a new token
- request A fetches the device auth request
- request B comes in to verify the request and create a new token, fetches the device auth request row, and executes this transaction
- request A executes this transaction
I think the result would be two tokens created from the same device auth request.
I think the easiest way to fix that might be to bail out at L4329 if execute()
reports having deleted any number of rows other than 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fixed in 152fd2a. Please LMK if I got the error handling wrong.
nexus/src/db/datastore.rs
Outdated
Ok(insert_token.get_result(conn)?) | ||
} else { | ||
Err(TxnError::CustomError( | ||
TokenGrantError::ConcurrentRequests, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is okay. I would probably have this say:
0 rows back => 404 -- there's no reason to expose this as "concurrent update" -- it just doesn't exist now
more than 1 row back => 500
If you want to do that, vpc_update_firewall_rules()
might be an example to follow? You basically need to return something from the transaction that's specific enough to match on it in the map_err()
below to decide whether to use Error::internal_error
or authz_resource.not_found()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry. Trying again in bf53d38.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this looks better. The case I'm still not sure about is when we find more than one item. I'd expect this to be a 500 (with a message like "unexpectedly found multiple device auth requests for the same user code") because I don't see how it could ever actually happen. The current code produces a 400 that says something about concurrent requests. (When we were talking about "concurrent requests" earlier, I think I meant concurrent verify requests, which is what could land you in the other case (which is now a 404, which is good). I'm not sure how concurrent requests could land you in this case...unless by "concurrent requests" you mean "we managed to have multiple [concurrent] device auth requests with the same user code", which is true, but that still seems like a server error.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct on all accounts, thank you. Hopefully last try: 8ce7e42.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Added a new package, crucible-dtrace that pulls from buildomat a package that contains a set of DTrace scripts. These scripts are extracted into the global zone at /opt/oxide/crucible_dtrace/ Update Crucible to latest includes these updates: Clean up dependency checking, fixing space leak (#1372) Make a DTrace package (#1367) Use a single context in all messages (#1363) Remove `DownstairsWork`, because it's redundant (#1371) Remove `WorkState`, because it's implicit (#1370) Do work immediately upon receipt of a job, if possible (#1366) Move 'do work for one job' into a helper function (#1365) Remove `DownstairsWork` from map when handling it (#1361) Using `block_in_place` for IO operations (#1357) update omicron deps; use re-exported dropshot types in oximeter-producer configuration (#1369) Parameterize more tests (#1364) Misc cleanup, remove sqlite references. (#1360) Fix `Extent::close` docstring (#1359) Make many `Region` functions synchronous (#1356) Remove `Workstate::Done` (unused) (#1355) Return a sorted `VecDeque` directly (#1354) Combine `proc_frame` and `do_work_for` (#1351) Move `do_work_for` and `do_work` into `ActiveConnection` (#1350) Support arbitrary Volumes during replace compare (#1349) Remove the SQLite backend (#1352) Add a custom timeout for buildomat tests (#1344) Move `proc_frame` into `ActiveConnection` (#1348) Remove `UpstairsConnection` from `DownstairsWork` (#1341) Move Work into ConnectionState (#1340) Make `ConnectionState` an enum type (#1339) Parameterize `test_repair.sh` directories (#1345) Remove `Arc<Mutex<Downstairs>>` (#1338) Send message to Downstairs directly (#1336) Consolidate `on_disconnected` and `remove_connection` (#1333) Move disconnect logic to the Downstairs (#1332) Remove invalid DTrace probes. (#1335) Fix outdated comments (#1331) Use message passing when a new connection starts (#1330) Move cancellation into Downstairs, using a token to kill IO tasks (#1329) Make the Downstairs own per-connection state (#1328) Move remaining local state into a `struct ConnectionState` (#1327) Consolidate negotiation + IO operations into one loop (#1322) Allow replacement of a target in a read_only_parent (#1281) Do all IO through IO tasks (#1321) Make `reqwest_client` only present if it's used (#1326) Move negotiation into Downstairs as well (#1320) Update Rust crate clap to v4.5.4 (#1301) Reuse a reqwest client when creating Nexus clients (#1317) Reuse a reqwest client when creating repair client (#1324) Add % to keep buildomat happy (#1323) Downstairs task cleanup (#1313) Update crutest replace test, and mismatch printing. (#1314) Added more DTrace scripts. (#1309) Update Rust crate async-trait to 0.1.80 (#1298)
Update Crucible and Propolis to the latest Added a new package, crucible-dtrace that pulls from buildomat a package that contains a set of DTrace scripts. These scripts are extracted into the global zone at /opt/oxide/crucible_dtrace/ Crucible latest includes these updates: Clean up dependency checking, fixing space leak (#1372) Make a DTrace package (#1367) Use a single context in all messages (#1363) Remove `DownstairsWork`, because it's redundant (#1371) Remove `WorkState`, because it's implicit (#1370) Do work immediately upon receipt of a job, if possible (#1366) Move 'do work for one job' into a helper function (#1365) Remove `DownstairsWork` from map when handling it (#1361) Using `block_in_place` for IO operations (#1357) update omicron deps; use re-exported dropshot types in oximeter-producer configuration (#1369) Parameterize more tests (#1364) Misc cleanup, remove sqlite references. (#1360) Fix `Extent::close` docstring (#1359) Make many `Region` functions synchronous (#1356) Remove `Workstate::Done` (unused) (#1355) Return a sorted `VecDeque` directly (#1354) Combine `proc_frame` and `do_work_for` (#1351) Move `do_work_for` and `do_work` into `ActiveConnection` (#1350) Support arbitrary Volumes during replace compare (#1349) Remove the SQLite backend (#1352) Add a custom timeout for buildomat tests (#1344) Move `proc_frame` into `ActiveConnection` (#1348) Remove `UpstairsConnection` from `DownstairsWork` (#1341) Move Work into ConnectionState (#1340) Make `ConnectionState` an enum type (#1339) Parameterize `test_repair.sh` directories (#1345) Remove `Arc<Mutex<Downstairs>>` (#1338) Send message to Downstairs directly (#1336) Consolidate `on_disconnected` and `remove_connection` (#1333) Move disconnect logic to the Downstairs (#1332) Remove invalid DTrace probes. (#1335) Fix outdated comments (#1331) Use message passing when a new connection starts (#1330) Move cancellation into Downstairs, using a token to kill IO tasks (#1329) Make the Downstairs own per-connection state (#1328) Move remaining local state into a `struct ConnectionState` (#1327) Consolidate negotiation + IO operations into one loop (#1322) Allow replacement of a target in a read_only_parent (#1281) Do all IO through IO tasks (#1321) Make `reqwest_client` only present if it's used (#1326) Move negotiation into Downstairs as well (#1320) Update Rust crate clap to v4.5.4 (#1301) Reuse a reqwest client when creating Nexus clients (#1317) Reuse a reqwest client when creating repair client (#1324) Add % to keep buildomat happy (#1323) Downstairs task cleanup (#1313) Update crutest replace test, and mismatch printing. (#1314) Added more DTrace scripts. (#1309) Update Rust crate async-trait to 0.1.80 (#1298) Propolis just has this one update: Allow boot order config in propolis-standalone --------- Co-authored-by: Alan Hanson <[email protected]>
Fixes #1255 and fixes #1285.
Records in the
device_auth_request
table are now short-lived so that we can useuser_code
as a primary key.Datastore authz is implemented except for lookups in the
device_access_token
table, since we need lookup by both thetoken
(primary key) and the unique combination ofclient_id
anddevice_code
. The current lookup machinery does not appear to make this easy; suggestions for workarounds would be welcome.