[Bug][Experiment]: The TCP mobile experiment doesn't enable the feature #26041

SoftVision-CosminMuntean · 2022-07-15T11:21:52Z

Steps to reproduce

Open the latest Nightly build downloaded from Google Play Store.
Tap the "..." menu button and tap the "Settings" option.
Tap the "About Firefox Nightly" option and tap 5 times on the Nightly logo to enter in debug mode.
Return to the "Settings" menu and tab the "Secret Settings" option.
Enable the "Use Nimbus Preview Collection" option.
Restart the application (at least twice) in order to have the Preview collection.
Enter again in debug mode (see step 3).
Return to the "Settings" menu and tab the "Nimbus Experiments" option.
Tap the "Total Cookie Protection Experiment v103" experiment.
Tap the "tcp-on" branch and restart the browser.
Navigate to the TCP test page and observe column 3 and 4.

Expected behaviour

Cookies and localStorage are "PARTITIONED" on both 3 and 4 columns.

Actual behaviour

Cookies and localStorage are "UNRESTRICTED" on both 3 and 4 columns.

Device name

Google Pixel 2XL, Samsung s9

Android version

Android 11, Android 10

Firefox release type

Firefox Nightly

Firefox version

103.0a1 and 104.0a4

Device logs

No response

Additional information

Here you can find a screen recording of the issue: Video.
You are successfully force enrolled in the "tcp-on" branch, but the feature is not enabled.
Here is the link to the Nimbus recipe for this experiment: Link.
We have also tested this on a created Firefox 103 build on the release channel and on the latest Nightly build installed from Google Play Store.
I am not sure but is there a chance for this behavior to be related to the one described in [Bug][Experiment]: The TCP prefs are reset to their initial value after a browser restart if they are manually changed from about:config #26040?

┆Issue is synchronized with this Jira Task

Mugurell · 2022-07-15T11:29:07Z

I am not sure but is there a chance for this behavior to be related to the one described in #26040?

They are related but while the behavior from #26040 is expected the Nimbus experiment should override the local settings whatever they may be.

SoftVision-LorandJanos · 2022-07-18T09:01:04Z

I was able to reproduce this issue on the latest Nightly 104.0a1 (2022-07-18) build.
Device used: Google Pixel 4 (Android 12).

Amejia481 · 2022-07-18T17:40:57Z

@SoftVision-CosminMuntean thanks for testing,
I noticed that on the STR we mentioned we are testing on nightly, but on the Recipe's JSON, we are targeting the release channel, it could the problem is that while you were testing you were not eligible to be enrolled on the experiment.

Amejia481 · 2022-07-18T17:46:23Z

Could be try to test on the same channel where we are launching the experiment?

An additional targeting version from the Recipe experiment page:

Full targeting expression (app_version|versionCompare('103.*') <= 0) && (app_version|versionCompare('103.!') >= 0)

SoftVision-CosminMuntean · 2022-07-19T09:40:03Z

@Amejia481 thanks for looking over this issue.

I have also verified with Fenix 103.1.0 and the feature is still not enabled by the experiment. I can enroll in the experiment in the treatment branch, but the feature is not enabled.
We have also created a recipe using the Stage environment and naturally enrolled in the experiment, but the feature is still not enabled.
So it seems that the problem is not that we cannot enroll in the experiment, the problem is that after we enroll in the experiment, in the treatment branch, the TCP feature is not enabled.

However, just to be sure, the test page mentioned in Step 11 is a valid way to verify that we have TCP enabled?

Amejia481 · 2022-07-19T12:34:29Z

Thanks @SoftVision-CosminMuntean, for double checking, I'll continue investigating, I'll asking the nimbus team for help as I'm no sure why every time I query the nimbus API, about the TCP experiment I get the same value independently of the branch that I'm enrolled into.

However, just to be sure, the test page mentioned in Step 11 is a valid way to verify that we have TCP enabled?

Yes, it's good way. Additionally, you could verify by going to about:config and searching for network.cookie.cookieBehavior and it should be set to 5.

SoftVision-CiprianMuresan · 2022-07-19T13:50:27Z

Marking this as S1 as it blocks further experiment testing.

Amejia481 · 2022-07-20T13:25:06Z

Thanks @SoftVision-CiprianMuresan, @jhugman and I we are investigating!

jonalmeida · 2022-07-28T20:36:27Z

Copying over James' JIRA comment here so I can reference it (Fenix JIRA is only a one-way sync):

Progress/investigation:

the lazy caching of the feature value() which can be removed from settings.totalCookieProtection

the value coming out of the value() is always the default value for engine-settings (no other feature). The Variables implementation that the generated code (FML) gets is always NullVariables, which appeared to be null coming out of the SDK.

Closer inspection reveals that the SDK available to the feature is null!

This indicates that the call to get the engine-settings configuration object from FxNimbus i.e. FxNimbus.features.engineSettings.value() is likely happening before FxNimbus.initialize { components.analytics.experiments }

ConceptFetchHttpUploader requires the engine, which is needed by Glean which is initialized before Nimbus; i.e. we can’t experiment with concept-fetch

Side note: concept-fetch is also needed for Nimbus initialization, so when we disable Glean, we get a deadlock. I don’t know why we get a deadlock not a stack overflow.

The FxNimbus.initialize() method takes a closure to allow features to lazily initialize the Nimbus SDK on demand when the feature first needs it. This getSdk property is passed to FeatureHolders on first use.

For features that are used before the Nimbus SDK is constructed, (i.e. engine-settings feature only) the feature holder never has access to the SDK.

Short term unblocking fix:

use a deprecated way of attaching the SDK to the generated FxNimbus code.

FxNimbus.api = nimbus

Longer term:

consider ways of updating the SDK or the getSdk closure after features have been first used, i.e. make the initialize method proactively go update all FeatureHolders with new SDK objects or getSdk closures.

Plan:

Jonathan Almeida will land the short term fix,

Jonathan Almeida and James Hugman to discuss best path/best practice for updating the early-use features with the new value of getSdk

Nimbus team to implement longer term fix recommended in step 2.

To clarify this part:

ConceptFetchHttpUploader requires the engine, which is needed by Glean which is initialized before Nimbus; i.e. we can’t experiment with concept-fetch

Side note: concept-fetch is also needed for Nimbus initialization, so when we disable Glean, we get a deadlock. I don’t know why we get a deadlock not a stack overflow.

There are many parts of the app that require the engine to be initialized ASAP. Even if we are able to disable Glean, the engine will be initialized by the next system that needs (e.g. which is probably the profiler). So concept-fetch isn't the cause of the dependency issue, but the first part of the engine that is immediately needed.

Short term unblocking fix:

use a deprecated way of attaching the SDK to the generated FxNimbus code.

FxNimbus.api = nimbus

With the workaround that is mentioned above, we are then required to the tell the engine to re-evaluate it's settings for any feature we want to experiment with (in this case, we re-evaluate the tracking protection settings).

…s ready. There are three issues here that we have uncovered while investigating this bug: 1. Settings.kt has a lazy block around `enabledTotalCookieProtection` which ends up caching the first result it evaluates. 2. The `FeatureHolder` within FxNimbus has a lazy block around it which also ends up caching the first result it evaluates. 3. Nimbus is not ready to return a result for an engine experiment when we need it early on in the dependency tree initialization. There are multiple systems that require engine to be initialized for them to work (e.g. Glean, Profiler, concept-fetch). In our TCP, experiment, we need to apply these engine settings during the engine initialization. So when we try and evaluate Nimbus that early on, it has not had time to initialize itself correctly or even use the engine's concept-fetch client to return the correct experiment result. This bug is made worse because of the first two caching bugs where we are always holding onto a cached value of the wrong result. Our temporary solution is to: 1. Remove the `lazy` around `Settings.enabledTotalCookieProtection`. 2. Set the `FxNimbus.api` value right after we are done initializing `FxNimbus` and `NimbusApi` so that all future queries to FxNimbus will be made against a real instance of `NimbusApi`. This is a short-term fix for the `FeatureHolder` caching bug. 3. Set a new TrackingProtectionPolicy that will evaluate Nimbus now that it is in the correct state when receive the `NimbusInterface.Observer.onUpdatesApplied`. Co-authored-by: jhugman

…s ready. There are three issues here that we have uncovered while investigating this bug: 1. Settings.kt has a lazy block around `enabledTotalCookieProtection` which ends up caching the first result it evaluates. 2. The `FeatureHolder` within FxNimbus caches the incorrectly evaluated value and returns this value hence forth. 3. Nimbus is not ready to return a result for an engine experiment when we need it early on in the dependency tree initialization. There are multiple systems that require engine to be initialized for them to work (e.g. Glean, Profiler, concept-fetch). In our TCP, experiment, we need to apply these engine settings during the engine initialization. So when we try and evaluate Nimbus that early on, it has not had time to initialize itself correctly or even use the engine's concept-fetch client to return the correct experiment result. This bug is made worse because of the first two caching bugs where we are always holding onto a cached value of the wrong result. Our temporary solution is to: 1. Remove the `lazy` around `Settings.enabledTotalCookieProtection`. 2. Set the `FxNimbus.api` value right after we are done initializing `FxNimbus` and `NimbusApi` so that all future queries to FxNimbus will be made against a real instance of `NimbusApi`. This is a short-term fix for the `FeatureHolder` caching bug. 3. Set a new TrackingProtectionPolicy that will evaluate Nimbus now that it is in the correct state when receive the `NimbusInterface.Observer.onUpdatesApplied`. Co-authored-by: jhugman <[email protected]>

…he engine There are three issues here that we have uncovered while investigating this bug: 1. Settings.kt has a lazy block around `enabledTotalCookieProtection` which ends up caching the first result it evaluates. 2. Nimbus is not ready to return a result for an engine experiment when we need it early on in the dependency tree initialization. 3. The `FeatureHolder` within FxNimbus caches the incorrectly evaluated value and returns this value hence forth. There are multiple systems that require the engine to be initialized for them to work (e.g. Glean, Profiler, concept-fetch). In our TCP, experiment, we need to apply these engine settings during the engine initialization. So when we try and evaluate Nimbus that early on, it has not had time to initialize itself correctly or even use the engine's concept-fetch client to return the correct experiment result. This bug is made worse because of the two other caching bugs where we are always holding onto a cached value of the wrong result. During investigation, we have identified that the BrowserStore was eagerly initialized (which initialized the engine as well via the `EngineMiddleware`) by trying to sending startup metrics from values in the store. By moving Nimbus to be initialized as early on as possible, we're able to guarantee now that our queries to get the feature branch for any engine experiment will provide us with the correct values from Nimbus. It's important to note that moving these calls to gather startup metrics are still happening as soon as possible, and are now more likely to be accurate in the future, if we collect metrics from the engine settings that the user has. In a debug build, we were also seeing a circular dependencies because of the StrictModeManager initializing the engine when trying to access the profiler - this call was also removed because it was originally added a preventative measure that is no longer required. In this change, we have also moved Glean initialization and wallpaper fetching into the `setupInMainProcessOnly` method to keep the original consistency of what is being initialized in the main process and when. This change does not affect the order of initialization in any way, however the inconsistency added a small amount of confusion during investigation, which is worth cleaning up in this patch as well. Co-authored-by: jhugman <[email protected]> Co-authored-by: Christian Sadilek <[email protected]>

…bus SDK is initialized There are three issues here that we have uncovered while investigating this bug: 1. Settings.kt has a lazy block around `enabledTotalCookieProtection` which ends up caching the first result it evaluates. 3. The `FeatureHolder` within FxNimbus caches the incorrectly evaluated value and returns this value hence forth. 4. Nimbus is not ready to return a result for an engine experiment when we need it early on in the dependency tree initialization. There are multiple systems that require engine to be initialized for them to work (e.g. Glean, Profiler, concept-fetch). In our TCP, experiment, we need to apply these engine settings during the engine initialization. So when we try and evaluate Nimbus that early on, it has not had time to initialize itself correctly or even use the engine's concept-fetch client to return the correct experiment result. This bug is made worse because of the first two caching bugs where we are always holding onto a cached value of the wrong result. Our temporary solution is to: 1. Remove the `lazy` around `Settings.enabledTotalCookieProtection`. 2. Set the `FxNimbus.api` value right after we are done initializing `FxNimbus` and `NimbusApi` so that all future queries to FxNimbus will be made against a real instance of `NimbusApi`. This is a short-term fix for the `FeatureHolder` caching bug. 3. Set a new TrackingProtectionPolicy that will evaluate Nimbus now that it is in the correct state when receive the `NimbusInterface.Observer.onUpdatesApplied`. Co-authored-by: jhugman <[email protected]> Co-authored-by: Christian Sadilek <[email protected]>

As part of mozilla-mobile/fenix#26041, we need to ensure we have at least version 93.6.0 of the plugin downloaded. The reason we are not using a newer version, even though they are available, is to reduce the complexity of things that go into the next AC release in order to get the Fenix experiment launched.

jonalmeida · 2022-08-09T20:27:13Z

@SoftVision-CosminMuntean could you retest this experiment using the latest nightly 105 build (available in a few hours from when this comment was written)?

SoftVision-CosminMuntean · 2022-08-10T11:17:43Z

@jonalmeida We have retested this issue on the latest Nightly 105.0a1 build and we are successfully enrolled in the experiment and also the TCP feature is enabled if enrolling in the "tcp-on" branch.

We have verified this on Google 2 XL with Android 11, Samsung S9 with Android 11, and Motorola Moto G(10) with Android 11.

As part of mozilla-mobile/fenix#26041, we need to ensure we have at least version 93.6.0 of the plugin downloaded. The reason we are not using a newer version, even though they are available, is to reduce the complexity of things that go into the next AC release in order to get the Fenix experiment launched. (cherry picked from commit 95c0400)

jonalmeida · 2022-08-11T01:48:04Z

Thank you!

…tialized There are three issues here that we have uncovered while investigating this bug: 1. Settings.kt has a lazy block around `enabledTotalCookieProtection` which ends up caching the first result it evaluates. 3. The `FeatureHolder` within FxNimbus caches the incorrectly evaluated value and returns this value hence forth. 4. Nimbus is not ready to return a result for an engine experiment when we need it early on in the dependency tree initialization. There are multiple systems that require engine to be initialized for them to work (e.g. Glean, Profiler, concept-fetch). In our TCP, experiment, we need to apply these engine settings during the engine initialization. So when we try and evaluate Nimbus that early on, it has not had time to initialize itself correctly or even use the engine's concept-fetch client to return the correct experiment result. This bug is made worse because of the first two caching bugs where we are always holding onto a cached value of the wrong result. Our temporary solution is to: 1. Remove the `lazy` around `Settings.enabledTotalCookieProtection`. 2. Set the `FxNimbus.api` value right after we are done initializing `FxNimbus` and `NimbusApi` so that all future queries to FxNimbus will be made against a real instance of `NimbusApi`. This is a short-term fix for the `FeatureHolder` caching bug. 3. Set a new TrackingProtectionPolicy that will evaluate Nimbus now that it is in the correct state when receive the `NimbusInterface.Observer.onUpdatesApplied`. Co-authored-by: jhugman <[email protected]> Co-authored-by: Christian Sadilek <[email protected]> (cherry picked from commit f03ee91)

This workaround was temporary and is not needed with the Nimbus groovy plugin updates in Android Components.

jonalmeida · 2022-08-11T18:48:00Z

Apologiese @SoftVision-CosminMuntean, we had to land one more patch for this fix. Could you please verify that the experiment is still working in nightly?

…tialized There are three issues here that we have uncovered while investigating this bug: 1. Settings.kt has a lazy block around `enabledTotalCookieProtection` which ends up caching the first result it evaluates. 3. The `FeatureHolder` within FxNimbus caches the incorrectly evaluated value and returns this value hence forth. 4. Nimbus is not ready to return a result for an engine experiment when we need it early on in the dependency tree initialization. There are multiple systems that require engine to be initialized for them to work (e.g. Glean, Profiler, concept-fetch). In our TCP, experiment, we need to apply these engine settings during the engine initialization. So when we try and evaluate Nimbus that early on, it has not had time to initialize itself correctly or even use the engine's concept-fetch client to return the correct experiment result. This bug is made worse because of the first two caching bugs where we are always holding onto a cached value of the wrong result. Our temporary solution is to: 1. Remove the `lazy` around `Settings.enabledTotalCookieProtection`. 2. Set the `FxNimbus.api` value right after we are done initializing `FxNimbus` and `NimbusApi` so that all future queries to FxNimbus will be made against a real instance of `NimbusApi`. This is a short-term fix for the `FeatureHolder` caching bug. 3. Set a new TrackingProtectionPolicy that will evaluate Nimbus now that it is in the correct state when receive the `NimbusInterface.Observer.onUpdatesApplied`. Co-authored-by: jhugman <[email protected]> Co-authored-by: Christian Sadilek <[email protected]> (cherry picked from commit f03ee91)

This workaround was temporary and is not needed with the Nimbus groovy plugin updates in Android Components.

SoftVision-CosminMuntean · 2022-08-12T14:03:13Z

@jonalmeida no problem! We have retested the experiment on the latest Nightly 105.0a1 build and also on the new Beta 104.0b6.
Since the production recipe recipe for this experiment is moved to the "Draft" state we have cloned the recipe on the Stage environment in order to re-test the experiment.

Everything seems to work as expected: we are successfully enrolled in the experiment and also the TCP feature is enabled when enrolling in the "tcp-on" branch.

We have verified this on Google 2 XL with Android 11 and Samsung S9 with Android 11.

jonalmeida · 2022-08-15T15:38:40Z

@cpeterso @athomasmoz FYI: we've landed the patches in Nightly and 104. After the next 104 release, we can start the experiment.

cpeterso · 2022-08-15T17:04:28Z

@cpeterso @athomasmoz FYI: we've landed the patches in Nightly and 104. After the next 104 release, we can start the experiment.

Awesome. 104 will be released next week on 2022-08-23. We can try to launch the TCP experiment soon after.

As part of mozilla-mobile/fenix#26041, we need to ensure we have at least version 93.6.0 of the plugin downloaded. The reason we are not using a newer version, even though they are available, is to reduce the complexity of things that go into the next AC release in order to get the Fenix experiment launched.

SoftVision-CosminMuntean added needs:triage Issue needs triage 🐞 bug Crashes, Something isn't working, .. labels Jul 15, 2022

SoftVision-LorandJanos added S3 Blocks non-critical functionality and a work around exists qa-triaged Issues triaged by qa Nimbus labels Jul 18, 2022

SoftVision-CiprianMuresan added S1 Blocks development/testing, may impact more than 25% of users, causes data loss, potential chemspill and removed S3 Blocks non-critical functionality and a work around exists labels Jul 19, 2022

SoftVision-LorandJanos added S1 Blocks development/testing, may impact more than 25% of users, causes data loss, potential chemspill and removed S1 Blocks development/testing, may impact more than 25% of users, causes data loss, potential chemspill labels Jul 19, 2022

github-actions bot mentioned this issue Jul 28, 2022

Close #26041: Re-set TrackingProtectionPolicy after Nimbus SDK is initialized #26228

Merged

4 tasks

github-actions bot added the eng:reopen-for-qa Reopens and tags the issue for QA needed when the issue is merged label Jul 28, 2022

jhugman mentioned this issue Jul 29, 2022

Upgrade version of AppServices from which we download nimbus-fml. mozilla-mobile/android-components#12567

Closed

4 tasks

jonalmeida mentioned this issue Aug 2, 2022

Settings the FxNimbus.api is still required for initializing Nimbus mozilla/application-services#5075

Closed

jonalmeida mentioned this issue Aug 8, 2022

No issue: Update Nimbus plugin to use version 93.6.0 mozilla-mobile/android-components#12615

Merged

4 tasks

github-actions bot added this to the 105 milestone Aug 9, 2022

github-actions bot reopened this Aug 9, 2022

github-actions bot added eng:qa:needed QA Needed and removed eng:reopen-for-qa Reopens and tags the issue for QA needed when the issue is merged labels Aug 9, 2022

jonalmeida added a commit to jonalmeida/fenix that referenced this issue Aug 11, 2022

Close mozilla-mobile#26041: Remove un-needed Nimbus workaround

95f6a12

This workaround was temporary and is not needed with the Nimbus groovy plugin updates in Android Components.

github-actions bot mentioned this issue Aug 11, 2022

Close #26041: Remove un-needed Nimbus workaround #26427

Merged

4 tasks

github-actions bot added the eng:reopen-for-qa Reopens and tags the issue for QA needed when the issue is merged label Aug 11, 2022

jonalmeida added a commit that referenced this issue Aug 11, 2022

Close #26041: Remove un-needed Nimbus workaround

19fd044

This workaround was temporary and is not needed with the Nimbus groovy plugin updates in Android Components.

jonalmeida closed this as completed in d0c21c0 Aug 11, 2022

jonalmeida closed this as completed in #26427 Aug 11, 2022

github-actions bot reopened this Aug 11, 2022

github-actions bot removed the eng:reopen-for-qa Reopens and tags the issue for QA needed when the issue is merged label Aug 11, 2022

jonalmeida added a commit that referenced this issue Aug 11, 2022

Close #26041: Remove un-needed Nimbus workaround

0a56ead

This workaround was temporary and is not needed with the Nimbus groovy plugin updates in Android Components.

LaurentiuApahideanSV closed this as completed Aug 12, 2022

LaurentiuApahideanSV added eng:qa:verified QA Verified and removed eng:qa:needed QA Needed labels Aug 12, 2022

gabrielluong removed the needs:triage Issue needs triage label Aug 19, 2022

This was referenced Sep 8, 2022

[pull] main from mozilla-mobile:main nathanmkaya/fenix#108

Merged

[pull] main from mozilla-mobile:main fork-house/fenix#12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][Experiment]: The TCP mobile experiment doesn't enable the feature #26041

[Bug][Experiment]: The TCP mobile experiment doesn't enable the feature #26041

SoftVision-CosminMuntean commented Jul 15, 2022 •

edited by data-sync-user

Loading

Mugurell commented Jul 15, 2022

SoftVision-LorandJanos commented Jul 18, 2022

Amejia481 commented Jul 18, 2022

Amejia481 commented Jul 18, 2022

SoftVision-CosminMuntean commented Jul 19, 2022

Amejia481 commented Jul 19, 2022 •

edited

Loading

SoftVision-CiprianMuresan commented Jul 19, 2022

Amejia481 commented Jul 20, 2022

jonalmeida commented Jul 28, 2022 •

edited

Loading

jonalmeida commented Aug 9, 2022

SoftVision-CosminMuntean commented Aug 10, 2022

jonalmeida commented Aug 11, 2022

jonalmeida commented Aug 11, 2022

SoftVision-CosminMuntean commented Aug 12, 2022

jonalmeida commented Aug 15, 2022

cpeterso commented Aug 15, 2022

[Bug][Experiment]: The TCP mobile experiment doesn't enable the feature #26041

[Bug][Experiment]: The TCP mobile experiment doesn't enable the feature #26041

Comments

SoftVision-CosminMuntean commented Jul 15, 2022 • edited by data-sync-user Loading

Steps to reproduce

Expected behaviour

Actual behaviour

Device name

Android version

Firefox release type

Firefox version

Device logs

Additional information

Mugurell commented Jul 15, 2022

SoftVision-LorandJanos commented Jul 18, 2022

Amejia481 commented Jul 18, 2022

Amejia481 commented Jul 18, 2022

SoftVision-CosminMuntean commented Jul 19, 2022

Amejia481 commented Jul 19, 2022 • edited Loading

SoftVision-CiprianMuresan commented Jul 19, 2022

Amejia481 commented Jul 20, 2022

jonalmeida commented Jul 28, 2022 • edited Loading

jonalmeida commented Aug 9, 2022

SoftVision-CosminMuntean commented Aug 10, 2022

jonalmeida commented Aug 11, 2022

jonalmeida commented Aug 11, 2022

SoftVision-CosminMuntean commented Aug 12, 2022

jonalmeida commented Aug 15, 2022

cpeterso commented Aug 15, 2022

SoftVision-CosminMuntean commented Jul 15, 2022 •

edited by data-sync-user

Loading

Amejia481 commented Jul 19, 2022 •

edited

Loading

jonalmeida commented Jul 28, 2022 •

edited

Loading