-
Notifications
You must be signed in to change notification settings - Fork 473
Telemetry probe to measure frequency of reloading active tabs from killed content processes #9366
Comments
I'm also thinking that we would want to cancel any such timers as soon as the entire app is backgrounded. Thoughts? |
This seems great to have, yes. Talked to @agi today about handling low memory conditions and his advice for the future (multi-e10s) was to just rely on letting the OS kill processes instead of reacting to
Can you share some details or a doc on how content process allocation works? I am wondering how this would be implemented. Would it "simply" be a case of recording a timestamp when Wondering what @pocmo thinks too. |
Sure. AllocationAllocation is governed by the The process count is then used as a number of buckets to slot For example, suppose I am running Nightly and have not opened any sessions yet. Opening 3 tabs (let's assume simple integral, monotonically-increasing session ids) will result in 3 content processes:
When opening a fourth tab, we must now re-use a content process:
If I were to just continue opening new sessions, it would appear to be round-robin. Now suppose that we closed the
When the next
PrioritizationThe process hosting the currently active GeckoView specifies these priority levels in such a way that causes Android to prefer killing lower-priority processes before it goes after higher priority processes. Tying this in with AC and the telemetry probeLast summer, after some discussions with @pocmo, he opened #7820 to ensure that killed sessions were lazily restored, which was great. Generally speaking, we're not particularly interested in data about inactive sessions whose What we are interested in, however, is the rate that content processes are killed when those processes were hosting the active session, causing AC to require reloading that session in short order. This data would essentially track bug 1682319 for us. |
(And we also care about this in the single content process case, because I believe that one effective mitigation to the bug 1682319 problem may be multi-e10s itself!) |
@dblohm7 Trying to map this to the events we are seeing at the AC level, ... you want to track in telemetry how often we see |
I think something that would be interesting to track is:
This metrics could help us track how memory/performance affects retaining tabs/getting killed and how often our users get their tabs reloaded. Also it can help us know how many tabs are actually alive in users devices (e.g. when thinking about memory/performance tradeoffs based on number of tabs) |
FWIW we should also track the |
@pocmo Yes, that sounds correct to me. |
@agi could you raise a separate issue for these other metrics so we don't block on them? |
Sure: #9624 |
… session killed" and track engine session lifetime. * Once we link an `EngineSession` to a `Session` we track the time. * The separate `BrowserAction` allows us to write a Middleware for this event. * I was unhappy with SystemClock requiring the Android stdlib and therefore making mocking a pain, or requiring the slow Robolectric test runner. I ended up with this wrapper class, that seems to work well in Fenix when writing unit tests. The next step is to write a Middleware in Fenix that looks at those events and records metrics in Glean. I will open a PR for that soon.
… session killed" and track engine session lifetime. * Once we link an `EngineSession` to a `Session` we track the time. * The separate `BrowserAction` allows us to write a Middleware for this event. * I was unhappy with SystemClock requiring the Android stdlib and therefore making mocking a pain, or requiring the slow Robolectric test runner. I ended up with this wrapper class, that seems to work well in Fenix when writing unit tests. The next step is to write a Middleware in Fenix that looks at those events and records metrics in Glean. I will open a PR for that soon.
…d" and track engine session lifetime. * Once we link an `EngineSession` to a `Session` we track the time. * The separate `BrowserAction` allows us to write a Middleware for this event. * I was unhappy with SystemClock requiring the Android stdlib and therefore making mocking a pain, or requiring the slow Robolectric test runner. I ended up with this wrapper class, that seems to work well in Fenix when writing unit tests. The next step is to write a Middleware in Fenix that looks at those events and records metrics in Glean. I will open a PR for that soon.
Required patches landed in AC and Fenix and I just triggered a new Fenix Nightly. By next Monday we should start to see some data for Nightly. |
Hi, verified as fixed on latest master using a Pixel 2 API 28 (Android 9) Emulator. Had 2 open tabs (one in foreground, one in background) Properly generated metrics ping
|
In GV we have some telemetry to measure lifetime for content processes and such, however in this particular case I believe that AC is the right layer in the stack for this probe.
As I scale up the number of content processes permitted by GV, we want to answer the following question:
I'm thinking that a timer might be the best option, since then we get both timings and frequency and can just Analyze All The Things afterward.
┆Issue is synchronized with this Jira Task
The text was updated successfully, but these errors were encountered: