-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add confidence
field to PerformanceNavigationTiming
#202
Comments
@csharrison FYI |
Thanks for tagging me. I am excited to see this proposal progress. On the last web perf call there was some mention of making this extensible to multiple data types beyond |
Wanted to chime in with some soft feedback about just the API shape: Right now the proposal is to just expose a Any existing user / observer of the navigation timing API (there are lots) just looking at the raw output would need to know ahead of time that this is a "fuzzed" value with a specific epsilon. It feels to me like most folks in most situations would not have this extra context, and it would be worth being more self-documenting as it is a very new use case... Strawman: What about some value-wrapper to make it very explicit that the value is fuzzed?: interface Fuzzy<T, U extends number> {
fuzzyValue: T;
epsilon: U;
} Then I do see a reference in the alternatives considered section to separating the ancillary data and exposing the I am not sure: is coupling all fuzzed values a necessary requirement in order to maintain unreliability? It seems to me like maybe not necessarily, unless the values being reported are inherently correlated? I guess another alternative would just be to rely on naming convention: |
+1 to including the epsilon value (or some rate of flipping) in the API itself. This provides a few benefits:
In either of these cases, providing the value upfront ensures consumers can interpret the data properly (which is important for the debiasing step). |
@mmocny - Thanks for that feedback. I can update the proposal to ensure we capture the triggerRate as part of the API shape. I don't see anything in the webidl spec that describes exactly what you suggested. Do you know if that's possible? @csharrison - Thanks! I think the next request would be "what conditions triggered this to be low confidence?". We discussed something like this offline:
This isn't an exhaustive list of conditions (https://github.com/w3c/web-performance/wiki/Nice-things-we-can%27t-have), and it already has a fairly high flip probability with RR. I'm concerned there may not even be enough data in a particular bucket to successfully debias the data, given the flip rate. I'm not familiar with the ins and outs of the more complex local differential privacy algorithms though, so open to ideas :) One other idea I had, that might be a simpler(?) approach to answer why the confidence rating is low, is something like:
and then that could be exposed via
But we'd probably want to cap the number of reasons returned, and maybe limit it being non-empty to only low confidence cases to help reduce the flip probability. |
@mwjacksonmsft thanks. One other question: let's say we expand this interface to support more data types - would we expect some users to still prefer a higher accuracy single If so, we may want to consider a "query" type API, where rather than having some static fields on For the two extensions you mentioned, there:
I would also suggest considering an even simpler version of In any case, I have a colab showing how how variance will change under both randomized response (krr) and the more advanced RAPPOR algorithm as the number of dimensions increases (with epsilon=ln 3, # of dimensions/outputs on x axis): It would not surprise me if in this high privacy regime, the utility will be bad as we increase dimensions from binary (which has variance < N). |
This issue came up for a discussion today in the WG. To clarify my statement, I don't have an immediate ask from anyone asking for that, but I can see that being the next ask. I'm reaching out to our customers to get their input. What factors would need consideration if we were to extend this at a later point in time? For example, you mentioned in the collab "The other downside of RAPPOR is that in the epsilon=ln(3) range, RAPPOR underperforms k-RR until k>5.". If we picked RR now, would it be reasonable to switch the algorithm out when K> 5 if that were expressed in some form in the API? |
@mwjacksonmsft I think the primary factor is just dealing with a breaking change. E.g. if we move from randomized response to RAPPOR, everyone will need to update their code to deal with a new format / debiasing strategy. To mitigate this, we could try to make the API forward-compatible with algorithm changes, but that increases complexity. LMK if it makes sense. |
@csharrison Could you elaborate on how the API shape might need to change to be forward compatible? I'd imagine that the debiasing strategy would be the more problematic aspect. Or am I missing something? |
There is a spectrum of breakage:
I think (3) and (4) are probably the worst, so let me give you an example of how we could get there. Imagine we do some research and it turns out there are some use-cases that want to capture any This is a reasonable request! However, we supply A possible alternative could be a dynamic method on |
Note: this was discussed on the June 6 2024 W3C WebPerf Working Group call, minutes here. There was some discussion on what "low confidence" means and a request to discuss some of the use-cases further. |
@csharrison Thanks. I've heard back from our customers. As long as the contributing factors remain outside of their control, they couldn't immediately think of needing to know a reason why the To the example that you highlighted, we need to think through if/how those values would be exposed via |
I've pushed an updated version of the explainer that includes the |
@mwjacksonmsft sounds good. If we're reasonably confident we can stick with just a |
Following the discussion here and looking the explainer, I think we have two alternative API shapes. NavigationTiming entry attributeAdding a PerformanceNavigationTimingConfidence attribute on the performance timeline. To address @mmocny's concerns, we could either name the attribute something like randomizedConfidence or make sure that the internal The pro of this approach is that we'd be attaching the value to the NavigationTiming timeline itself, making it clearly about navigation timing. It's also simple and discoverable. The cons:
|
The second proposal is very similar to Add new Type value for performance.getEntriesByType If we were to pursue that approach, and we expanded this signal beyond navigation load times, there needs to be some way of correlating the confidence value returned from the new API with corresponding performance object. Two examples that come to mind are:
As randomized confidence data is only useful for backend processing, I expect developers need to bundle it up with the original object and include both for backend processing. Something like:
|
We are also frequently seeing bimodal distributions when analyzing website performance, and have started tracking some of these headers, which has given us more detailed information about “server think time”...
Has anyone else explored those to see if they are part of the reason for this behavior and thought about how to include this in any 'confidence' interval. This site is a good example: https://www.webpagetest.org/result/240624_AiDcCE_AR0/
And subsequently divergent FCP values which seem to be correlated: |
Hi @ear-dev - The proposal has been mostly focused on factors that impact the user agent, so I hadn't considered server-side timings in this proposal. In local testing, I do see these reflected in the ``serverTimings
Does this information meet your needs to determine if the page is slow due to "server think time"? |
@yoavweiss I ended up building a couple of different prototypes to test out these options. The first one attaches the
The second one uses a
The third one builds upon the first and allows a dictionary object to be passed via
I think there are two main concerns I have with the second option. Firstly, developer ergonomic concerns - the data isn't useful locally, and the only thing you can do is bundle it up for backend processing. Secondly, providing too many configuration options, could potentially introduce privacy concerns if called multiple times with different parameters. Admittedly it's hard to quantify that without a more concrete proposal. This second concern is equally applicable to the third option. |
Apologies for my slowness. I think that the main difference between 1 and 2/3 is related to future extensibility and ergonomics. From my perspective option 2 is more flexible (and e.g. we could extend it in the future to have different confidence levels per entry (e.g. if the page started loading under stress, but then calmed down later, we'd be able to express that if we'd so wish).
Sure, but that's also true for NavigationTiming. The data is only useful in aggregate, but with (1) if we want to know the confidence value of a specific entry, we'd need to inspect its relative NavigationTiming entry, which feels less ergonomic somehow.
I wouldn't expect the confidence level to change when inspected multiple times. Would it? |
What would be the outcome of exposing these? What's the action a web author can take when "the confidence in the navigation entry is low because of thermal pressure"? |
@noamr I connected with our customers about this. Their feedback was that as long as the contributing factors remain outside of their control, they couldn't think of a reason to know the reason why the confidence value was low. Consequently, I've dropped this from the proposal. |
Is the suggestion that this API might return more than one entry per type or that we'd update the existing entries as new information became available? Or something else?
In the prototype I built, I ended up with a preference for (1) for two reasons:
However, I could see (2) being updated to take an entry instead of an entry type, which addresses those concerns. Perhaps something like:
WDYT?
@csharrison mentioned this:
I was expressing a concern that if we allowed this, that it might result in less privacy if called multiple times requesting different sensitive attributes. |
I actually think the idea to hang this on the observer is the most consistent. Also for navigation timing, reading this value before the Something like: |
@noamr Were you thinking that the |
No, I think that once you explicitly opted in to this in the observer, we can simply add |
@noamr Thanks for clarifying. I imagine that case, if the developer makes this call However, if they held onto e.g.
Does that align with how you were thinking about it? Here is a prototype of the API changes: 5766476: Prototype implementation confidence from observer | https://chromium-review.googlesource.com/c/chromium/src/+/5766476 |
Need to think about exact API names but this is the direction I was thinking about, yes. |
The downside of having this only be available to performance observers is that it'd be impossible to collect this data for navigations that never make it to their load event. |
Can the confidence level change between receiving the response headers and the load event? Also, is this planned to be exposed in iframes? |
It seems unlikely, but I'm not sure. The use case for our customers is for navigation that occurs during a user agent cold launch. We've discussed future potential factors such as extensions impact, or other system resource considerations (e.g. high cpu usage). https://github.com/w3c/web-performance/wiki/Nice-things-we-can%27t-have
I don't have a preference if this is exposed within iframes or not. For SystemEntropy, we did decide it shouldn't be exposed within iframes, but that was without any privacy protections. |
@noamr Upon re-reviewing the data we collected, the data suggests that most of the randomness that occurred was between navigationStart and responseEnd. There didn't appear to be much variation after domLoading. However, the caveat to that is this data was narrowly collected for user agent launch scenarios. |
I have one more small suggestion for this proposal regarding how developers deal with noise. One simplification for developers would be for the platform to debias each report individually. This could look like exposing both a noisy Imagine we have an
These numbers come from the formula If you have a slice of records and you want to count how many are high confidence, you can just sum up each record's unbiased count without doing any other math and you will get an unbiased estimate of the total. For a histogram breakout, let the mass of each record in the histogram be its unbiased count, etc. etc. |
OK, I guess what we know that feeds into confidence (e.g. cold start) is known when the document is created. Still, I think we should figure out if this feature should be available in iframes, to avoid a situation where multiple iframes are created to try to track changes in confidence continuously to reduce the noise. |
I'm comfortable with returning null for the |
To enable developers to discern if the navigation timings are representative for their web application, the change adds a new ‘confidence’ field to the PerformanceNavigationTiming struct. The confidence field should not be populated until the confidence value is finalized. This change contains the blink changes to expose the new API, as well as the infrastructure for the cross-process communication. This change always returns a value of 'high' confidence for top level navigations, and returns null for iframes. In future changes noise will be added via a differential privacy algorithm. The finalized confidence is stored in the DocumentLoadTiming class, and is sent by the RenderFrameHostImpl in response to a notification that the document is now 'interactive'. Changes to add usecounters will be added after this change lands. Explainer: https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/PerformanceNavigationTiming%20for%20User%20Agent%20Launch/explainer.md Chrome Status: https://chromestatus.com/feature/5186950448283648 Dev Design: https://docs.google.com/document/d/1D6DqptsCEd3wPRsZ0q1iwVBAXXmhxZuLV-KKFI0ptCg/edit?usp=sharing I2P: https://groups.google.com/a/chromium.org/g/blink-dev/c/o0F7nBKsgg0/m/bJSp3ekfAAAJ W3C Issue: w3c/navigation-timing#202 Manual testing steps: 1) Start the browser with these command line parameters: --enable-features=PerformanceNavigationTimingConfidence https://example.com 2) Open the developer tools, and switch to the Console tool. 3) Run "window.performance.getEntriesByType('navigation')[0].confidence" into the console, and you should see 'high' returned. Bug: 1413848 Change-Id: I9590c6a3899aa756af6abc6d6c1a7c2b88bde439 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5906123 Auto-Submit: Mike Jackson <[email protected]> Commit-Queue: Mike Jackson <[email protected]> Reviewed-by: Yoav Weiss (@Shopify) <[email protected]> Cr-Commit-Position: refs/heads/main@{#1370718}
Web applications may suffer from bimodal distribution in page load performance, due to factors outside of the web application’s control. For example:
In these scenarios, content the web app attempts to load will be in competition with other work happening on the system. This makes it difficult to detect if performance issues exist within web applications themselves, or because of external factors.
Teams we have worked with have been surprised at the difference between real-world dashboard metrics and what they observe in page profiling tools. Without more information, it is challenging for developers to understand if (and when) their applications may be misbehaving or are simply being loaded in a contended period.
A new ‘confidence’ field on the PerformanceNavigationTiming object will enable developers to discern if the navigation timings are representative for their web application.
Explainer:
https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/PerformanceNavigationTiming%20for%20User%20Agent%20Launch/explainer.md
Chromium Status:
https://chromestatus.com/feature/5186950448283648
/cc @yoavweiss
The text was updated successfully, but these errors were encountered: