Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Close #26041: Re-set TrackingProtectionPolicy after Nimbus SDK is initialized #26228

Merged
merged 2 commits into from
Aug 9, 2022

Conversation

jonalmeida
Copy link
Contributor

@jonalmeida jonalmeida commented Jul 28, 2022

There are three issues here that we have uncovered while investigating
this bug:

  1. Settings.kt has a lazy block around enabledTotalCookieProtection
    which ends up caching the first result it evaluates.
  2. The FeatureHolder within FxNimbus caches the incorrectly
    evaluated value and returns this value hence forth.
  3. Nimbus is not ready to return a result for an engine experiment
    when we need it early on in the dependency tree initialization.

There are multiple systems that require engine to be initialized for
them to work (e.g. Glean, Profiler, concept-fetch). In our TCP,
experiment, we need to apply these engine settings during the engine
initialization. So when we try and evaluate Nimbus that early on, it
has not had time to initialize itself correctly or even use the
engine's concept-fetch client to return the correct experiment result.
This bug is made worse because of the first two caching bugs where we
are always holding onto a cached value of the wrong result.

Our temporary solution is to:

  1. Remove the lazy around Settings.enabledTotalCookieProtection.
  2. Set the FxNimbus.api value right after we are done initializing
    FxNimbus and NimbusApi so that all future queries to FxNimbus
    will be made against a real instance of NimbusApi. This is a
    short-term fix for the FeatureHolder caching bug.
  3. Set a new TrackingProtectionPolicy that will evaluate Nimbus now
    that it is in the correct state when receive the
    NimbusInterface.Observer.onUpdatesApplied.

Co-authored-by: jhugman
Co-authored-by: Christian Sadilek

Pull Request checklist

  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Screenshots: This PR includes screenshots or GIFs of the changes made or an explanation of why it does not
  • Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features. In addition, it includes a screenshot of a successful accessibility scan to ensure no new defects are added to the product.

QA

  • QA Needed

To download an APK when reviewing a PR (after all CI tasks finished running):

  1. Click on Checks at the top of the PR page.
  2. Click on the firefoxci-taskcluster group on the left to expand all tasks.
  3. Click on the build-debug task.
  4. Click on View task in Taskcluster in the new DETAILS section.
  5. The APK links should be on the right side of the screen, named for each CPU architecture.

GitHub Automation

Fixes #26041

@jonalmeida jonalmeida added the needs:review PRs that need to be reviewed label Jul 28, 2022
@jonalmeida jonalmeida requested review from a team as code owners July 28, 2022 21:28
@jhugman jhugman changed the title Close #26041: Re-evaluate TCP experiment after Nimbus is ready. Close #26041: Re-set TrackingProtectionPolicy after Nimbus updates are applied Jul 29, 2022
@jhugman jhugman changed the title Close #26041: Re-set TrackingProtectionPolicy after Nimbus updates are applied Close #26041: Re-set TrackingProtectionPolicy after Nimbus SDK is initialized Jul 29, 2022
Copy link
Contributor

@jhugman jhugman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good minimal fix for TCP. I might also recommend we flip the default for TCP in the nimbus.fml.yaml to true. I've asked bugs to be filed if they haven't already for the tabs prioritization.

// value again so that we can still access the NimbusApi that is wrapped
// in `FxNimbus.initialize.getSdk`.
// See: https://github.com/mozilla-mobile/fenix/issues/26041
FxNimbus.api = this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed mozilla/application-services#4999 first released in v93.6.0

It should already be in main, and releases_v104.0.0.

Added mozilla-mobile/android-components#12567 to land on main, this should be uplifted to releases_v104.0.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FeatureHolder within FxNimbus has a lazy block around it
which also ends up caching the first result it evaluates.

This isn't quite accurate: prior to v93.6.0, the feature holder didn't get to hear about a call to connect the Nimbus SDK (the Rust code that connects to the server and calculates enrollment) if it had been used already.

So if the feature was used before the very earliest steps to initialize the SDK had been started, then we return the default values because if the SDK isn't available, the feature just returns that defaults every time its called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed mozilla/application-services#4999 first released in v93.6.0

It should already be in main, and releases_v104.0.0.

Added mozilla-mobile/android-components#12567 to land on main, this should be uplifted to releases_v104.0.0.

@jhugman Upgrading to the latest version (that is in mozilla-mobile/android-components#12592), I'm still not able to get the experiment to work without adding this line.

At this point, I'm more inclined to leaving this line in here and filing a bug to have it fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonalmeida @jhugman I don't think we can land the FxNimbus.api = this assignment because it's not thread-safe. It's a public var without any visibility guarantees, i.e., it's not volatile or synchronized, and also executed on an IO thread here.

Maybe that's also part of the problem? I know @jhugman made FeatureHolder thread-safe but this assignment isn't (to be fair I'd think it's also not meant to be called like this.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be fair I'd think it's also not meant to be called like this

This is correct. This is a workaround, using the old API, which we've deprecated, and are trying to remove.

As discussed, the fix is to upgrade the FML tooling to at least v93.6.0 (currently on v93.5.0).

That said, there is an issue with upgrading to latest (currently v93.8.0) due to mozilla/application-services#5079 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhugman Above @jonalmeida mentioned that upgrading to the latest version didn't fix the problem. We should take another look together, please.

@jonalmeida jonalmeida requested a review from jhugman August 2, 2022 23:51
@jonalmeida jonalmeida changed the title Close #26041: Re-set TrackingProtectionPolicy after Nimbus SDK is initialized Close #26041: Allow Nimbus to initialize first before the engine Aug 3, 2022
…bus SDK is initialized

There are three issues here that we have uncovered while investigating
this bug:

 1. Settings.kt has a lazy block around `enabledTotalCookieProtection`
    which ends up caching the first result it evaluates.
 3. The `FeatureHolder` within FxNimbus caches the incorrectly
    evaluated value and returns this value hence forth.
 4. Nimbus is not ready to return a result for an engine experiment
    when we need it early on in the dependency tree initialization.

There are multiple systems that require engine to be initialized for
 them to work (e.g. Glean, Profiler, concept-fetch). In our TCP,
 experiment, we need to apply these engine settings during the engine
 initialization. So when we try and evaluate Nimbus that early on, it
 has not had time to initialize itself correctly or even use the
 engine's concept-fetch client to return the correct experiment result.
 This bug is made worse because of the first two caching bugs where we
 are always holding onto a cached value of the wrong result.

Our temporary solution is to:

 1. Remove the `lazy` around `Settings.enabledTotalCookieProtection`.
 2. Set the `FxNimbus.api` value right after we are done initializing
    `FxNimbus` and `NimbusApi` so that all future queries to FxNimbus
    will be made against a real instance of `NimbusApi`. This is a
    short-term fix for the `FeatureHolder` caching bug.
 3. Set a new TrackingProtectionPolicy that will evaluate Nimbus now
    that it is in the correct state when receive the
    `NimbusInterface.Observer.onUpdatesApplied`.

Co-authored-by: jhugman <[email protected]>
Co-authored-by: Christian Sadilek <[email protected]>
@jonalmeida jonalmeida changed the title Close #26041: Allow Nimbus to initialize first before the engine Close #26041: Re-set TrackingProtectionPolicy after Nimbus SDK is initialized Aug 4, 2022
Copy link
Contributor

@jhugman jhugman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added further comments in the review.

@jhugman
Copy link
Contributor

jhugman commented Aug 4, 2022

@jonalmeida asks: why does setting FxNimbus.api = this work?

FeatureHolders are instantiated lazily, which is why they get the right getSdk when they’re used after the initialize is called.

However, if they’re instantiated before the initialize method is called, they get the initial value of getSdk.

The initial value of getSdk is { api }.

So changing the value of api allowed a FeatureHolder that the initial value getSdk to get a different result when calling the same getSdk later on.

For more context: FxNimbus.api = nimbus was the in first version of wiring the generated code to the Nimbus SDK. It wasn't satisfactory, precisely for the reasons @csadilek raises, so we deprecated it in favour of a newer threadsafe mechanism.

That it still works is a lucky accident for us, but definitely the not something we want anyone to rely on.

@csadilek
Copy link
Contributor

csadilek commented Aug 4, 2022

FeatureHolders are instantiated lazily, which is why they get the right getSdk when they’re used after the initialize is called. However, if they’re instantiated before the initialize method is called, they get the initial value of getSdk. The initial value of getSdk is { api }.

@jhugman But this workaround here calls FxNimbus.api = this after initialize is called, and after the FeatureHolder (for tracking protection) was instantiated. Why does this fix it? You mentioned the problem occurs when the FeatureHolders are initialized before initialize is called.

@jonalmeida jonalmeida added pr:needs-landing PRs that are ready to land [Will be merged by Mergify] and removed pr:do-not-land needs:review PRs that need to be reviewed labels Aug 9, 2022
@jonalmeida
Copy link
Contributor Author

We've landed the 83.6.0 Nimbus plugin change, so we can safely land this change as well.

@mergify mergify bot merged commit f03ee91 into mozilla-mobile:main Aug 9, 2022
@jonalmeida
Copy link
Contributor Author

jonalmeida commented Aug 10, 2022

QA verified the fix we have in Nightly, so let's uplift this to the 104.0 release; the AC uplift has already landed (albeit it needs a new AC release for it to be used).

@Mergifyio backport releases_v104.0.0

@jonalmeida
Copy link
Contributor Author

@Mergifyio backport releases_v104.0.0

@mergify
Copy link
Contributor

mergify bot commented Aug 11, 2022

backport releases_v104.0.0

✅ Backports have been created

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr:needs-landing PRs that are ready to land [Will be merged by Mergify]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][Experiment]: The TCP mobile experiment doesn't enable the feature
4 participants