Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/SYNC-4244_stale #1546

Merged
merged 11 commits into from
May 9, 2024
Merged

feat/SYNC-4244_stale #1546

merged 11 commits into from
May 9, 2024

Conversation

jrconlin
Copy link
Member

Description

Logs to STDERR a message indicating the FxA UID and client state for any transaction that uses a stale value.
This also generates a new request.error.invalid_client_state metric to track how many instances of this sort of thing we see.

Issue(s)

Closes: SYNC-4244

@jrconlin jrconlin requested review from pjenvey and taddes April 25, 2024 22:49
Comment on lines 400 to 407
if !old_client_states.is_empty() {
info!(
"Tokenserver user has old client states";
"uid" => raw_user.uid,
"count" => old_client_states.len()
)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty normal situation, is even more common with the current state of the purge script so I don't see it being that useful?

Suggested change
if !old_client_states.is_empty() {
info!(
"Tokenserver user has old client states";
"uid" => raw_user.uid,
"count" => old_client_states.len()
)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things that was discussed prior was that we didn't have a clear idea how many users were being impacted by the a user client state change. Once the purge script runs, I would suspect that there would be less clients with that potential state. While normal, we don't really know how common it is.

I'm fine specifying this as a metric instead, if that might be useful. If this is a very common occurrence or we have other ways to address the concern about users with excessive old client states, then I'm fine dropping this as well. I'd also be fine setting a reporting threshold of old_client_states.len() > 1 for this log message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was we wanted to know how many users were affected by https://bugzilla.mozilla.org/show_bug.cgi?id=1892904 triggered by invalid-client-state w/ a description of "Unacceptable client-state value stale value".

old_client_states is soley used to trigger that error, but it's only triggered when the current client_state matches one of those old ones. I'm certain the large majority of users with non empty old_client_states (especially w/ the state of the purge script) don't actually hit that error -- the fact that it's non empty doesn't tell us very much in my opinion -- just that a user changed a password at some point and that replaced_at record hasn't been cleaned up by purge (yet).

Once the purge script catches up the only old_client_states we should be seeing are those within the purge script's grace period of 24 hours.

..
}
) {
Some("request.error.invalid_client_state".to_owned())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth having a different metric (or preferably tag, but we don't have ReportableError::tags in this codebase yet which let's you do that) for the specific "Unacceptable client-state value stale value" issue in question.

Though I suppose we can alternatively look at the warn log messages for a count of those 🤷‍♂️

Copy link
Member Author

@jrconlin jrconlin May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's hold off on adding a tag for that. I see that there's already a ticket for that work.

In any case, I'll add a different ticket to do that and we can add the tag once that lands.

@jrconlin jrconlin merged commit 8307955 into master May 9, 2024
8 checks passed
@jrconlin jrconlin deleted the feat/SYNC-4244_stale branch May 9, 2024 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants