-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate randomized Experiment selection, feature flags for performance analysis #45
Comments
Triage: we have no known experiments in Fenix GA so low priority. In fact, Colin is looking at taking experiments out of GA to improve startup perf. |
Maybe mozilla-mobile/fenix#6278 will help here. |
Triage: the P1 ask is to opt-out of all experiments, especially on CI. We can file a follow-up to opt-in to experiments. This hinges on when experiments will be reintroduced so I'll contact the folks involved. |
triage: we want to wait for eric's opinion but we suspect this could be a high priority, just below startup work |
We have experiment selection code on start up now: mozilla-mobile/fenix#16901 Moving to triage. |
Triage: the secure storage experiment is landing and csadilek does not expect it to have a perf impact for start up or page load (where our tests are). We could double-check the PR though mozilla-mobile/fenix#18333 |
This shouldn't affect the measurements we take for start up or page load, afaict. |
Triage: there are new experiments in discussion. N.B.: they may not land soon or ever land. They are:
|
Those three experiments (1 Leanplum and 2 Nimbus) are going to be implemented after the MR1 release (mid-April). They will be running simultaneously. |
The top two issues don't seem like they'll affect start up. The last one mozilla-mobile/fenix#18375 might for FNPRMS where the message will pop up presumably on the homescreen after the third run (since we don't have conditioned profiles). csadilek also mentions there's a secure storage experiment but it probably doesn't impact start up either. csadilek also mentions that we have a menu in the secret settings that displays all the known experiments – this could be useful for figuring out when it's useful to work on this issue. |
Triage: no new experiments off the top of our heads. |
Triage: even if this doesn't affect perf tests, this affects experiments. |
Triage: desktop runs into the same problem, let's ask Bas for their opinion. From our discussion, csadilek wants to test what we're shipping in Nightly – i.e. no preferences or overrides. mcomella wants to pin to one set of experiments for each test so our tests are more controlled. |
Triage: if Bas remembers correctly, we opt out of all experiments for perf tests (though we should verify this with sparky edit: Sparky thinks this is set here for normandy). To ensure we don't regress performance for code behind experiment flags, 1) we expect developers to regularly run their patches against perf tests on try, 2) when we run the experiment in production, we can compare perf telemetry against the baseline (example), and 3) when the experiments finally get released, we test them against CI and users. This ensures experiments are tested in isolation but does not test them in combination (particularly because cohorts allocated for experiments cannot be put into other experiments) so it's possible, but unlikely, that if two experiments are tested in isolation and don't cause perf issues, those experiments may regress when turned on simultaneously and we haven't tested that. To get ourselves closer to desktop, we have some follow-ups to do:
|
Will close since this has become a meta issue. |
Why/User Benefit/User Problem
In our analysis for debug vs. release builds mozilla-mobile/fenix#6931, we decided to use a release-like build. However, in release-like builds, Experimentation is enabled and users may randomly opt into various experiments: this bug is to understand the implications and mitigate them.
Furthermore, some features are behind "feature flags" so they're enabled in Nightly and debug builds only: we should figure out how to address these too (should this be a separate bug?).
Impact
Without this, performance results will not be reliable because random experiment opt-in may impact our ability to accurately measure performance.
Acceptance Criteria (how do I know when I’m done?)
The text was updated successfully, but these errors were encountered: