-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(utd_hook): Fix regression causing retry to report false late decrypt #4252
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4252 +/- ##
==========================================
- Coverage 84.95% 84.92% -0.03%
==========================================
Files 274 274
Lines 29768 29769 +1
==========================================
- Hits 25288 25282 -6
- Misses 4480 4487 +7 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
According to @BillCarsonFr: this bug should have caused the UTD rate reported to Posthog to (incorrectly) drop significantly. We haven't actually observed that -- possibly because the relevant change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks correct to me; sorry for missing this before.
It would be good if we had a regression test.
None | ||
} else { | ||
trace!( | ||
kind = ?event.kind, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably a bit verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not changed by this PR. Any other reason to remove it? is it spamming the RS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contains most of the event's data that's not privacy-sensitive, so it will be spammy. This line can be reached multiple times for each undecrypted event that failed to re-decrypt again, so it will be super spammy, please remove it.
Actually there is no grace period on rust sdk UTD reporting like on other platform; On other platforms, an event decrypted in less than 4s is graced out and not reported. It is not the case in rust sdk, so the UTD would be reported immediatly but as a temporary UTD (with |
Added a regression test |
// Notify observers that we managed to eventually decrypt an event. | ||
if let Some(hook) = unable_to_decrypt_hook { | ||
hook.on_late_decrypt(&remote_event.event_id, *utd_cause).await; | ||
if let matrix_sdk::deserialized_responses::TimelineEventKind::UnableToDecrypt { utd_info, ..} = event.kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use use
, not the fully qualified name here
None | ||
} else { | ||
trace!( | ||
kind = ?event.kind, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contains most of the event's data that's not privacy-sensitive, so it will be spammy. This line can be reached multiple times for each undecrypted event that failed to re-decrypt again, so it will be super spammy, please remove it.
{ | ||
let utds = hook.utds.lock().unwrap(); | ||
assert_eq!(utds.len(), 1); | ||
// Without the regression fix this would be Some(time to decrypt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Without the regression fix this would be Some(time to decrypt) | |
// This is the main thing we're testing: if this wasn't identified as a definite UTD, this would be `Some(..)`. |
match olm_machine | ||
.try_decrypt_room_event(raw.cast_ref(), room_id, &decryption_settings) | ||
.await? | ||
{ | ||
RoomEventDecryptionResult::Decrypted(decrypted) => Ok(decrypted.into()), | ||
RoomEventDecryptionResult::UnableToDecrypt(utd_info) => { | ||
Ok(TimelineEvent::new_utd_event(raw.clone(), utd_info)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed, out of curiosity? This is for the test timeline; is it operating that differently from the non-test timeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed to make the regression test ~"fails correctly".
If not the test never detects the behavior change because it is using the mock Decryptor
that continues to fail with an error when there is an UTD instead of return a OK UTDEvent type.
The underlying problem is that we use a big catch everything Error for all possible errors of the SDK. And here the change that created the regression should have resulted in a change of signature of the trait. It is not anymore Result<, ALLERRORS> but a Result <,SubsetOfErrors>?
(Great catch and thanks for fixing, btw) |
There has been a recent change on
Decryptor#decrypt_event_impl
causing the function to return an TimelineEvent of kind unable to decrypt instead of failing with an error.The
late_decrypt
detection code was not changed, causing any retry to mark UTDs as late decrypt.Signed-off-by: