-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit how fxa-client recovers from expired push subscriptions #3127
Comments
Here's where desktop applies this logic, as part of refreshing its list of peer devices: Notice how it just reaches on in to the push service, zaps the subscription and requests a new one. We can't do the same for the FxA rust component, because the push service lives outside of FxA and is managed by the app. So we probably need some events that trigger to cause to app to re-register the push subscription. |
@mhammond suggested in slack, that we could also take a look at the device records on the server, look for ones that have |
@ckarlof described a situation recently where he hadn't used Fenix for several days, and when he tried to send a tab from desktop to Fenix it did not arrive. Launching the Fenix app and allowing it to sync seemed to fix send-tab without any other interaction, and subsequent sends from desktop to Fenix worked as expected. What I suspect happened here, is that Fenix's push subscription became invalid while the app was not in use, meaning that it stopped being able to receive sent tabs in a timely manner. When the app was started and synced, it discovered the expired push endpoint, automatically repaired it, and was once again able to properly receive tabs. Interestingly, Chris also reported that Fenec did not seem to have this problem, which makes me wonder if there's some background smarts in the Fennec app to keep the push subscriptions active while the app is not in use. |
I tested it again. Starting from Desktop, I sent a tab to Fennec and Fenix. I got no notification for either from my phone. After opening Fennec, and triggering a manual sync, I got the tab I sent to Fennec. I did a manual sync on Fenix, and did not receive the tab. Trying again on Fennec, I now get instant notifications for tabs sent to Fennec. Fenix is still broken in receiving tabs sent from Desktop. |
@jonalmeida are there Android logs that @ckarlof can pull to figure out what's happening here?
I don't think Fenix polls the send tab commands on manual sync. Maybe we should? On the other hand, if the user has to rely on Sync Now to get their tabs we've already failed.
On the Desktop side, in the browser toolbox |
@eoger Yes, you should see a warning log in there if that flag was propagated to us from
I mentioned this in another ticket I think, but we try to avoid DDOS-ing FxA if we already went down this flow within the last 24 hours. So you might not see that log again until then. |
It was in a now-resolved comment in this google doc, for anyone going hunting for that context. |
Sorry, I missed replying to this. Sync Now button should be polling for device commands. If that's not working, then something else there is in a bad state. |
Sounds like there's more going on in @ckarlof's case than my initial theory, so let's spin that off into a standalone issue to ensure we don't lose track of the overall thread here: https://bugzilla.mozilla.org/show_bug.cgi?id=1641147 |
Turns out I had a Firefox Beta install that's in a similar boat to what I described above - connected to my account, but idle for a couple of weeks. Here's what I observed:
So as far as I can tell, this Fenix instance is not successfully restoring its push endpoint on the FxA server. I'll dig into logcat and see what else I can find out. |
I found instances of this log line being emitted from my Fenix, which is good, but it doesn't seem to be successfully resolving the problem. |
@jonalmeida so I can confirm from logs that my Fenix hit the code you linked here to discover that its push subscription is expired, and it appears to actually get a new subscription with the push service. Some log snippets:
However, this seems to just repeat each time the device syncs. Naively, I would expect the |
Ryan, this roughly captures what I experienced as well. |
I filed mozilla-mobile/android-components#7143 to follow up further on this specific thread of reasoning, since it's easy for individual items to get lost in this kind of general "audit" bug. |
Sorry about the slow response!
This is interesting! It's good to also know our checks for the
This flag is a bit special, in that, we don't request a new subscription change immediately. We nuke the app's push token to get a fresh token from the push provider (Firebase) to trigger the re-subscription. That's what we see above.
That's also why you can see the marketing SDK recover from this with the new push token that they send to their own push servers. The
This is great! We received the token and provided it to native layer.
It seems here we're trying to perform some action on the account (send tab I assume) and we run into this flow again. What's missing is the call to check if our subscriptions are still valid. This happens when the feature is initialized which in turn happens on app startup in Fenix. What should happen then, is during the next cold boot (swipe the app away before starting the app), we should trigger that mechanism. This will fire the @rfk did you device eventually get push working in a day? (You could try now since that's how long it took me to get back 😅 ) |
If this isn't happening, then we would have to look into why |
Nope, the FxA server still reports |
Right, so when we provide a new push token, we're not getting an indication that the subscriptions need to be renewed to pass it on to the consumers. Btw, thanks for looking into this @rfk ! |
Glad to be able to reproduce! Just re-upping, I think we should spin this into a dedicated bug (e.g. mozilla-mobile/android-components#7143) for followup since I expect that we may have several other threads of discussion under the "audit" banner of this issue. |
I spent a bit of time digging into how the moving parts fit together here on iOS; /cc @garvankeeley for context. Some observations (with the caveat that I could easily be missing something):
I'm going to file a followup bug to get some telemetry around the |
When this is called application-services/components/fxa-client/ios/FxAClient/FxAccountDeviceConstellation.swift Line 33 in 9214e71
localDevice.subscriptionExpired and if false, log that to Sentry. Thoughts?
Or if we know we should be doing something at that point, we could log to Sentry and take necessary action to refresh the subscription. |
I think it would be valuable. I believe @eoger is also investigating whether we can found out how many iOS devices might be in this state from server data, and it the answer is "lots" then maybe we don't need any logging here.
I don't know the code well, but I think if we flag the push setup as needing recovery similar to what we do on decryption errors it should help. |
See also #3199 for an appservices-side bug about doing something other than debug logging here. |
From conversation in slack, yes, it would be a good idea to do this periodically on iOS as well. Followup bug incoming. |
We discussed this bug in triage today, and it feels like we've done all the "auditing" that we can. Let's close this out in favour of the identified followup bugs, which I've edited into the issue description and will summarize here as well for completeness:
|
Thanks to the vagaries of The Cloud, it's possible for an FxA client device to register a push subscription endpoint with the FxA server, but for that push subscription endpoint to later become invalid. The FxA server detects this when it tries to send a push notification to that device and fails, and it reacts by setting a
pushEndpointExpired
flag on the device record.What mechanisms do we have for detecting this on the client and updating the push subscription, and are they working as expected?
Analysis of push notification metrics for send-tab suggest that up to
25%20% of attempts to send a tab, fail to send the corresponding push notification, and that it's mostly due to this "expired push endpoint" issue.Summary of identified follow-up work:
onSubscriptionChanged
handler mozilla-mobile/android-components#7143)pushEndpointExpired
flag; let's add handling to see how often it happens in practice. (Swift FxAccountDeviceConstellation should notify when push endpoint is expired #3199).┆Issue is synchronized with this Jira Task
┆Story Points: 8
┆Epic: Trusted Send Tab Telemetry
┆Sprint: SYNC - end 2020-06-05
The text was updated successfully, but these errors were encountered: