Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/SYNC-4244_stale #1546

Merged
merged 11 commits into from
May 9, 2024
1 change: 1 addition & 0 deletions syncserver/src/tokenserver/extractors.rs
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ impl TokenserverRequest {
.contains(&self.auth_data.client_state)
{
let error_message = "Unacceptable client-state value stale value".to_owned();
warn!("Client attempted stale value"; "uid"=> self.user.uid, "client_state"=> self.user.client_state.clone());
return Err(TokenserverError::invalid_client_state(error_message));
}

Expand Down
8 changes: 8 additions & 0 deletions tokenserver-common/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,14 @@ impl ReportableError for TokenserverError {
TokenType::BrowserId => Some("request.error.browser_id".to_owned()),
TokenType::Oauth => Some("request.error.oauth".to_owned()),
}
} else if matches!(
self,
TokenserverError {
status: "invalid-client-state",
..
}
) {
Some("request.error.invalid_client_state".to_owned())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth having a different metric (or preferably tag, but we don't have ReportableError::tags in this codebase yet which let's you do that) for the specific "Unacceptable client-state value stale value" issue in question.

Though I suppose we can alternatively look at the warn log messages for a count of those 🤷‍♂️

Copy link
Member Author

@jrconlin jrconlin May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's hold off on adding a tag for that. I see that there's already a ticket for that work.

In any case, I'll add a different ticket to do that and we can add the tag once that lands.

} else {
None
}
Expand Down
2 changes: 2 additions & 0 deletions tokenserver-db/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
extern crate diesel;
#[macro_use]
extern crate diesel_migrations;
#[macro_use]
extern crate slog_scope;

mod error;
pub mod mock;
Expand Down
16 changes: 14 additions & 2 deletions tokenserver-db/src/models.rs
Original file line number Diff line number Diff line change
Expand Up @@ -389,14 +389,22 @@ impl TokenserverDb {
let raw_user = raw_users[0].clone();

// Collect any old client states that differ from the current client state
let old_client_states = {
let old_client_states: Vec<String> = {
raw_users[1..]
.iter()
.map(|user| user.client_state.clone())
.filter(|client_state| client_state != &raw_user.client_state)
.collect()
};

if !old_client_states.is_empty() {
info!(
"Tokenserver user has old client states";
"uid" => raw_user.uid,
"count" => old_client_states.len()
)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty normal situation, is even more common with the current state of the purge script so I don't see it being that useful?

Suggested change
if !old_client_states.is_empty() {
info!(
"Tokenserver user has old client states";
"uid" => raw_user.uid,
"count" => old_client_states.len()
)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things that was discussed prior was that we didn't have a clear idea how many users were being impacted by the a user client state change. Once the purge script runs, I would suspect that there would be less clients with that potential state. While normal, we don't really know how common it is.

I'm fine specifying this as a metric instead, if that might be useful. If this is a very common occurrence or we have other ways to address the concern about users with excessive old client states, then I'm fine dropping this as well. I'd also be fine setting a reporting threshold of old_client_states.len() > 1 for this log message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was we wanted to know how many users were affected by https://bugzilla.mozilla.org/show_bug.cgi?id=1892904 triggered by invalid-client-state w/ a description of "Unacceptable client-state value stale value".

old_client_states is soley used to trigger that error, but it's only triggered when the current client_state matches one of those old ones. I'm certain the large majority of users with non empty old_client_states (especially w/ the state of the purge script) don't actually hit that error -- the fact that it's non empty doesn't tell us very much in my opinion -- just that a user changed a password at some point and that replaced_at record hasn't been cleaned up by purge (yet).

Once the purge script catches up the only old_client_states we should be seeing are those within the purge script's grace period of 24 hours.

// Make sure every old row is marked as replaced. They might not be, due to races in row
// creation.
for old_user in &raw_users[1..] {
Expand Down Expand Up @@ -463,7 +471,11 @@ impl TokenserverDb {
// The most up-to-date user doesn't have a node and is retired. This is an internal
// service error for compatibility reasons (the legacy Tokenserver returned an
// internal service error in this situation).
(_, None) => Err(DbError::internal("Tokenserver user retired".to_owned())),
(_, None) => {
let uid = raw_user.uid;
warn!("Tokenserver user retired"; "uid" => &uid);
Err(DbError::internal("Tokenserver user retired".to_owned()))
}
}
}
}
Expand Down