Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add confidence field to PerformanceNavigationTiming #202

Open
mwjacksonmsft opened this issue Jun 3, 2024 · 35 comments
Open

Add confidence field to PerformanceNavigationTiming #202

mwjacksonmsft opened this issue Jun 3, 2024 · 35 comments

Comments

@mwjacksonmsft
Copy link

Web applications may suffer from bimodal distribution in page load performance, due to factors outside of the web application’s control. For example:

  • When a user agent first launches (a "cold start" scenario), it must perform many expensive initialization tasks that compete for resources on the system.
  • Browser extensions can affect the performance of a website. For instance, some extensions run additional code on every page you visit, which can increase CPU usage and result in slower response times.
  • When a machine is busy performing intensive tasks, it can lead to slower loading of web pages.

In these scenarios, content the web app attempts to load will be in competition with other work happening on the system. This makes it difficult to detect if performance issues exist within web applications themselves, or because of external factors.

Teams we have worked with have been surprised at the difference between real-world dashboard metrics and what they observe in page profiling tools. Without more information, it is challenging for developers to understand if (and when) their applications may be misbehaving or are simply being loaded in a contended period.

A new ‘confidence’ field on the PerformanceNavigationTiming object will enable developers to discern if the navigation timings are representative for their web application.

Explainer:
https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/PerformanceNavigationTiming%20for%20User%20Agent%20Launch/explainer.md

Chromium Status:
https://chromestatus.com/feature/5186950448283648

/cc @yoavweiss

@clelland
Copy link
Contributor

clelland commented Jun 4, 2024

@csharrison FYI

@csharrison
Copy link

Thanks for tagging me. I am excited to see this proposal progress. On the last web perf call there was some mention of making this extensible to multiple data types beyond confidence on PerformanceNavigationTiming. Is that planned / likely? The reason I ask is that we may want to consider noising mechanisms that support multi-dimensional data to future-proof us in that case.

@mmocny
Copy link

mmocny commented Jun 4, 2024

Wanted to chime in with some soft feedback about just the API shape:

Right now the proposal is to just expose a confidence field with a literal primitive value ("high" or "low").

Any existing user / observer of the navigation timing API (there are lots) just looking at the raw output would need to know ahead of time that this is a "fuzzed" value with a specific epsilon. It feels to me like most folks in most situations would not have this extra context, and it would be worth being more self-documenting as it is a very new use case...

Strawman: What about some value-wrapper to make it very explicit that the value is fuzzed?:

interface Fuzzy<T, U extends number> {
  fuzzyValue: T;
  epsilon: U;
}

Then confidence be of type Fuzzy<string, 1.1>. This would be much more self-documenting for any readers, and also would probably extrapolate better for adding more such values?


I do see a reference in the alternatives considered section to separating the ancillary data and exposing the triggerRate, which somewhat overlapping -- but that alternative evaluated grouping all the data values together and highlighted the complexities there.

I am not sure: is coupling all fuzzed values a necessary requirement in order to maintain unreliability? It seems to me like maybe not necessarily, unless the values being reported are inherently correlated?


I guess another alternative would just be to rely on naming convention: fuzzyMaybeConfidenceValue: "high" but I like that less.

@csharrison
Copy link

+1 to including the epsilon value (or some rate of flipping) in the API itself. This provides a few benefits:

  • Different browsers can choose more or less strict values
  • The value can drift over time if we think our initial choices were wrong

In either of these cases, providing the value upfront ensures consumers can interpret the data properly (which is important for the debiasing step).

@mwjacksonmsft
Copy link
Author

@mmocny - Thanks for that feedback. I can update the proposal to ensure we capture the triggerRate as part of the API shape. I don't see anything in the webidl spec that describes exactly what you suggested. Do you know if that's possible?

@csharrison - Thanks! I think the next request would be "what conditions triggered this to be low confidence?".

We discussed something like this offline:

    cpuPressureState: { "nominal", "fair", "serious", "critical" }
    thermalsPressureState: { "nominal", "fair", "serious", "critical" }
    isColdStart: { true, false }
    userAgentPressureState: { "nominal", "fair", "serious", "critical" }
    gpuPressureState: { "nominal", "fair", "serious", "critical" }
    eplison: <float>

This isn't an exhaustive list of conditions (https://github.com/w3c/web-performance/wiki/Nice-things-we-can%27t-have), and it already has a fairly high flip probability with RR. I'm concerned there may not even be enough data in a particular bucket to successfully debias the data, given the flip rate. I'm not familiar with the ins and outs of the more complex local differential privacy algorithms though, so open to ideas :)

One other idea I had, that might be a simpler(?) approach to answer why the confidence rating is low, is something like:

enum ConfidenceReason  {
    coldStart,
    cpuPressure,
    thermalsPressure,
    userAgentPressure,
    gpuPressure,
}

and then that could be exposed via

    sequence<ConfidenceReason> confidenceReasons;

But we'd probably want to cap the number of reasons returned, and maybe limit it being non-empty to only low confidence cases to help reduce the flip probability.

@csharrison
Copy link

csharrison commented Jun 5, 2024

@mwjacksonmsft thanks. One other question: let's say we expand this interface to support more data types - would we expect some users to still prefer a higher accuracy single confidence number rather than the more granular data you describe?

If so, we may want to consider a "query" type API, where rather than having some static fields on PerformanceNavigationTiming, we have a new method which allows the caller to e.g. query either the confidence bit, or the full conditions list you outlined, or both. This may allow some use-cases to get higher accuracy in return for coarser data.

For the two extensions you mentioned, there:

  • 512 distinct outputs for the full output
  • 33 distinct outputs for the sequence<ConfidenceReason> output

I would also suggest considering an even simpler version of sequence<ConfidenceReason> that just emits a single confidence reason (if any), and picks one at random if multiple exist. This would only have 6 output states.

In any case, I have a colab showing how how variance will change under both randomized response (krr) and the more advanced RAPPOR algorithm as the number of dimensions increases (with epsilon=ln 3, # of dimensions/outputs on x axis):
image

It would not surprise me if in this high privacy regime, the utility will be bad as we increase dimensions from binary (which has variance < N).

@mwjacksonmsft
Copy link
Author

This issue came up for a discussion today in the WG. To clarify my statement, I don't have an immediate ask from anyone asking for that, but I can see that being the next ask. I'm reaching out to our customers to get their input.

What factors would need consideration if we were to extend this at a later point in time?

For example, you mentioned in the collab "The other downside of RAPPOR is that in the epsilon=ln(3) range, RAPPOR underperforms k-RR until k>5.". If we picked RR now, would it be reasonable to switch the algorithm out when K> 5 if that were expressed in some form in the API?

@csharrison
Copy link

@mwjacksonmsft I think the primary factor is just dealing with a breaking change. E.g. if we move from randomized response to RAPPOR, everyone will need to update their code to deal with a new format / debiasing strategy. To mitigate this, we could try to make the API forward-compatible with algorithm changes, but that increases complexity. LMK if it makes sense.

@mwjacksonmsft
Copy link
Author

@csharrison Could you elaborate on how the API shape might need to change to be forward compatible? I'd imagine that the debiasing strategy would be the more problematic aspect. Or am I missing something?

@csharrison
Copy link

There is a spectrum of breakage:

  1. The flip probability of confidence changes. This is a small issue but resolvable by downstream systems with maybe a single line of code update.
  2. The debiasing strategy of confidence changes. This might be the case if e.g. we change confidence to e.g. sometimes flip more from high to low, than we do from low to high. To resolve this we may need to introduce more information about how confidence is being noised
  3. We could just always set confidence to high (or low) but officially deprecate it in favor of a new mechanism.
  4. We could remove confidence in favor of a different mechanism entirely, which could break JS

I think (3) and (4) are probably the worst, so let me give you an example of how we could get there. Imagine we do some research and it turns out there are some use-cases that want to capture any ConfidenceReason, but some use-cases that really just care about coldStart. Now, we will always have pressure to reduce the noise, so advocates of just querying coldStart ask to use all the privacy budget for the other signals (confidence, ConfidenceReason) to just query coldStart by itself, and get minimal noise.

This is a reasonable request! However, we supply confidence directly on PerformanceNavigationTiming, it isn't an opt-in API, everyone just gets it automatically. This makes it difficult for a caller to explicitly say they don't want it, because they want to use the scarce privacy budget on a more tightly scoped query.

A possible alternative could be a dynamic method on PerformanceNavigationTiming like querySensitiveAttribute('confidence',...) which would do the privacy mechanism on-demand, and allow for more flexibility if we offered more data, more algorithms, etc. Maybe overkill, but worth thinking about if we're excited about future extensibility.

@nicjansma
Copy link

Note: this was discussed on the June 6 2024 W3C WebPerf Working Group call, minutes here.

There was some discussion on what "low confidence" means and a request to discuss some of the use-cases further.

@mwjacksonmsft
Copy link
Author

@csharrison Thanks. I've heard back from our customers. As long as the contributing factors remain outside of their control, they couldn't immediately think of needing to know a reason why the confidence value was low.

To the example that you highlighted, we need to think through if/how those values would be exposed via toJSON.

@mwjacksonmsft
Copy link
Author

I've pushed an updated version of the explainer that includes the randomizedTriggerRate field.

@csharrison
Copy link

@mwjacksonmsft sounds good. If we're reasonably confident we can stick with just a confidence field for the time being I am happy with the proposal as is (with the addition of randomizedTriggerRate). I do think for the privacy we are considering this will be best for callers vs. trying to get more data with more noise.

@yoavweiss
Copy link
Contributor

yoavweiss commented Jun 14, 2024

Following the discussion here and looking the explainer, I think we have two alternative API shapes.

NavigationTiming entry attribute

Adding a PerformanceNavigationTimingConfidence attribute on the performance timeline. To address @mmocny's concerns, we could either name the attribute something like randomizedConfidence or make sure that the internal value is named something like randomizedValue.

The pro of this approach is that we'd be attaching the value to the NavigationTiming timeline itself, making it clearly about navigation timing. It's also simple and discoverable.

The cons:

  • We won't have a direct opt-in to enable us to e.g. change the algorithm parameters later on.
    • We would be able to use the PerformanceObserver options, but that's a bit awkward.
  • If we'd ever need a confidence attribute that relates to other parts of the page loading journey, we'd need another mechanism to do so.
  • The confidence signal is not really a metric, so it feels a bit out of place on the entry (even through there are other non-metric attributes there, so this is admittedly not a strong argument)

performance.getConfidence(["navigation"], {})

A direct API that provides the confidence signal.

The pro of that approach is that the API would enable developers to provide parameters, and would enable us to evolve it over time (e.g. change the algorithm, provide confidence for other metrics, etc).

The con of that it'd be less discoverable.

I think it all boils down to how likely we think it is that we'd expand this signal over time and beyond navigation loading times. I'd love opinions on that front.

@mwjacksonmsft
Copy link
Author

The second proposal is very similar to Add new Type value for performance.getEntriesByType

If we were to pursue that approach, and we expanded this signal beyond navigation load times, there needs to be some way of correlating the confidence value returned from the new API with corresponding performance object. Two examples that come to mind are:

  1. Both could contain a unique identifier (e.g. the document name)
  2. We could pass the object into the API itself (performance.getConfidence(performance.getEntriesByType('navigation')[0])).

As randomized confidence data is only useful for backend processing, I expect developers need to bundle it up with the original object and include both for backend processing. Something like:

let navObj = performance.getEntriesByType('navigation')[0];
let confObj = performance.getConfidence(navObj);
navObj ['confidence'] = confObj;
// Send to server

@ear-dev
Copy link

ear-dev commented Jun 24, 2024

We are also frequently seeing bimodal distributions when analyzing website performance, and have started tracking some of these headers, which has given us more detailed information about “server think time”...

  • 'server-timing', 'x-edgeconnect-midmile-rtt',
  • 'x-edgeconnect-origin-mex-latency',
  • 'x-envoy-upstream-service-time',
  • 'x-fastly-backend-reqs',
  • 'x-akamai-transformed',
  • 'server', 'x-powered-by',
  • 'x-akamai-transformed')
  • 'server-timing: cfRequestDuration'

Has anyone else explored those to see if they are part of the reason for this behavior and thought about how to include this in any 'confidence' interval.

This site is a good example: https://www.webpagetest.org/result/240624_AiDcCE_AR0/
Across three runs you can see these values in the root response headers:

  • "server-timing: cfRequestDuration;dur=356.000185, earlyhints",
  • "server-timing: cfRequestDuration;dur=21.000147"
  • "server-timing: cfRequestDuration;dur=70.999861, earlyhints"

And subsequently divergent FCP values which seem to be correlated:
image

@mwjacksonmsft
Copy link
Author

Hi @ear-dev - The proposal has been mostly focused on factors that impact the user agent, so I hadn't considered server-side timings in this proposal.

In local testing, I do see these reflected in the ``serverTimingspayload inperformance.getEntriesByType("navigation")[0]`:

{
    "name": "cfRequestDuration",
    "duration": 490.999937,
    "description": ""
}

Does this information meet your needs to determine if the page is slow due to "server think time"?

@mwjacksonmsft
Copy link
Author

@yoavweiss I ended up building a couple of different prototypes to test out these options.

The first one attaches the confidence value to the PerformanceNavigationTiming object, this is what the explainer describes.

The second one uses a getConfidence to return the confidence values for a given entry type.

The third one builds upon the first and allows a dictionary object to be passed via getEntriesByType. If the dictionary is not passed, then the confidence field returns null, otherwise it returns the expected value.

I think there are two main concerns I have with the second option. Firstly, developer ergonomic concerns - the data isn't useful locally, and the only thing you can do is bundle it up for backend processing. Secondly, providing too many configuration options, could potentially introduce privacy concerns if called multiple times with different parameters. Admittedly it's hard to quantify that without a more concrete proposal. This second concern is equally applicable to the third option.

@yoavweiss
Copy link
Contributor

Apologies for my slowness.

I think that the main difference between 1 and 2/3 is related to future extensibility and ergonomics.

From my perspective option 2 is more flexible (and e.g. we could extend it in the future to have different confidence levels per entry (e.g. if the page started loading under stress, but then calmed down later, we'd be able to express that if we'd so wish).
The main question is if that flexibility comes at a cost of ergonomics and/or discoverability.

I think there are two main concerns I have with the second option. Firstly, developer ergonomic concerns - the data isn't useful locally, and the only thing you can do is bundle it up for backend processing.

Sure, but that's also true for NavigationTiming. The data is only useful in aggregate, but with (1) if we want to know the confidence value of a specific entry, we'd need to inspect its relative NavigationTiming entry, which feels less ergonomic somehow.

Secondly, providing too many configuration options, could potentially introduce privacy concerns if called multiple times with different parameters.

I wouldn't expect the confidence level to change when inspected multiple times. Would it?

@noamr
Copy link
Contributor

noamr commented Aug 6, 2024

@mmocny - Thanks for that feedback. I can update the proposal to ensure we capture the triggerRate as part of the API shape. I don't see anything in the webidl spec that describes exactly what you suggested. Do you know if that's possible?

@csharrison - Thanks! I think the next request would be "what conditions triggered this to be low confidence?".

We discussed something like this offline:

    cpuPressureState: { "nominal", "fair", "serious", "critical" }
    thermalsPressureState: { "nominal", "fair", "serious", "critical" }
    isColdStart: { true, false }
    userAgentPressureState: { "nominal", "fair", "serious", "critical" }
    gpuPressureState: { "nominal", "fair", "serious", "critical" }
    eplison: <float>

This isn't an exhaustive list of conditions (https://github.com/w3c/web-performance/wiki/Nice-things-we-can%27t-have), and it already has a fairly high flip probability with RR. I'm concerned there may not even be enough data in a particular bucket to successfully debias the data, given the flip rate. I'm not familiar with the ins and outs of the more complex local differential privacy algorithms though, so open to ideas :)

One other idea I had, that might be a simpler(?) approach to answer why the confidence rating is low, is something like:

enum ConfidenceReason  {
    coldStart,
    cpuPressure,
    thermalsPressure,
    userAgentPressure,
    gpuPressure,
}

and then that could be exposed via

    sequence<ConfidenceReason> confidenceReasons;

What would be the outcome of exposing these? What's the action a web author can take when "the confidence in the navigation entry is low because of thermal pressure"?

@mwjacksonmsft
Copy link
Author

mwjacksonmsft commented Aug 6, 2024

@mmocny - Thanks for that feedback. I can update the proposal to ensure we capture the triggerRate as part of the API shape. I don't see anything in the webidl spec that describes exactly what you suggested. Do you know if that's possible?
@csharrison - Thanks! I think the next request would be "what conditions triggered this to be low confidence?".
We discussed something like this offline:

    cpuPressureState: { "nominal", "fair", "serious", "critical" }
    thermalsPressureState: { "nominal", "fair", "serious", "critical" }
    isColdStart: { true, false }
    userAgentPressureState: { "nominal", "fair", "serious", "critical" }
    gpuPressureState: { "nominal", "fair", "serious", "critical" }
    eplison: <float>

This isn't an exhaustive list of conditions (https://github.com/w3c/web-performance/wiki/Nice-things-we-can%27t-have), and it already has a fairly high flip probability with RR. I'm concerned there may not even be enough data in a particular bucket to successfully debias the data, given the flip rate. I'm not familiar with the ins and outs of the more complex local differential privacy algorithms though, so open to ideas :)
One other idea I had, that might be a simpler(?) approach to answer why the confidence rating is low, is something like:

enum ConfidenceReason  {
    coldStart,
    cpuPressure,
    thermalsPressure,
    userAgentPressure,
    gpuPressure,
}

and then that could be exposed via

    sequence<ConfidenceReason> confidenceReasons;

What would be the outcome of exposing these? What's the action a web author can take when "the confidence in the navigation entry is low because of thermal pressure"?

@noamr I connected with our customers about this. Their feedback was that as long as the contributing factors remain outside of their control, they couldn't think of a reason to know the reason why the confidence value was low. Consequently, I've dropped this from the proposal.

@mwjacksonmsft
Copy link
Author

@yoavweiss -

I think that the main difference between 1 and 2/3 is related to future extensibility and ergonomics.

From my perspective option 2 is more flexible (and e.g. we could extend it in the future to have different confidence levels per entry (e.g. if the page started loading under stress, but then calmed down later, we'd be able to express that if we'd so wish). The main question is if that flexibility comes at a cost of ergonomics and/or discoverability.

Is the suggestion that this API might return more than one entry per type or that we'd update the existing entries as new information became available? Or something else?

I think there are two main concerns I have with the second option. Firstly, developer ergonomic concerns - the data isn't useful locally, and the only thing you can do is bundle it up for backend processing.

Sure, but that's also true for NavigationTiming. The data is only useful in aggregate, but with (1) if we want to know the confidence value of a specific entry, we'd need to inspect its relative NavigationTiming entry, which feels less ergonomic somehow.

In the prototype I built, I ended up with a preference for (1) for two reasons:

  1. The value returned by (2) needs to be used with performance entries to be meaningful.
  2. If this were to be extended to other performance entry types, the existing observer patterns continue to work.

However, I could see (2) being updated to take an entry instead of an entry type, which addresses those concerns. Perhaps something like:

let entries = performance.getEntriesByType("navigation"); let [confidence] = performance.getConfidenceForEntries(entries);

WDYT?

Secondly, providing too many configuration options, could potentially introduce privacy concerns if called multiple times with different parameters.

I wouldn't expect the confidence level to change when inspected multiple times. Would it?

@csharrison mentioned this:

A possible alternative could be a dynamic method on PerformanceNavigationTiming like querySensitiveAttribute('confidence',...) which would do the privacy mechanism on-demand, and allow for more flexibility if we offered more data, more algorithms, etc.

I was expressing a concern that if we allowed this, that it might result in less privacy if called multiple times requesting different sensitive attributes.

@noamr
Copy link
Contributor

noamr commented Aug 6, 2024

I actually think the idea to hang this on the observer is the most consistent. Also for navigation timing, reading this value before the load event might mean that it can still change (and perhaps the confidence value can change as well?)
Which makes the observer a better candidate than performance.get*.

Something like:
observer.observe({type: "navigation", metadata: ["confidence"], buffered: true}); or some such

@mwjacksonmsft
Copy link
Author

@noamr Were you thinking that the PerformanceObserverEntryList would have a getConfidenceEntries (or similarly named) method?

@noamr
Copy link
Contributor

noamr commented Aug 6, 2024

@noamr Were you thinking that the PerformanceObserverEntryList would have a getConfidenceEntries (or similarly named) method?

No, I think that once you explicitly opted in to this in the observer, we can simply add confidence or some such on the regular timing entry.

@mwjacksonmsft
Copy link
Author

mwjacksonmsft commented Aug 6, 2024

@noamr Thanks for clarifying. I imagine that case, if the developer makes this call let [entry] = window.performance.getEntriesByType('navigation');, before using an observer, then entry.confidence would return null.

However, if they held onto entry, and then called the observer, then entry.confidence would return the confidence value.

e.g.

let [entry] = window.performance.getEntriesByType('navigation');

// entry.confidence returns null here

const observer = new PerformanceObserver((list, obj) => {
  list.getEntries().forEach((entry) => {
    console.log(entry.confidence);
  });
});

observer.observe({type: "navigation", metadata: ["confidence"], buffered: true});

// entry.confidence returns a PerformanceTimingConfidence object.

Does that align with how you were thinking about it?

Here is a prototype of the API changes: 5766476: Prototype implementation confidence from observer | https://chromium-review.googlesource.com/c/chromium/src/+/5766476

@noamr
Copy link
Contributor

noamr commented Aug 7, 2024

@noamr Thanks for clarifying. I imagine that case, if the developer makes this call let [entry] = window.performance.getEntriesByType('navigation');, before using an observer, then entry.confidence would return null.

However, if they held onto entry, and then called the observer, then entry.confidence would return the confidence value.

e.g.

let [entry] = window.performance.getEntriesByType('navigation');

// entry.confidence returns null here

const observer = new PerformanceObserver((list, obj) => {
  list.getEntries().forEach((entry) => {
    console.log(entry.confidence);
  });
});

observer.observe({type: "navigation", metadata: ["confidence"], buffered: true});

// entry.confidence returns a PerformanceTimingConfidence object.

Does that align with how you were thinking about it?

Here is a prototype of the API changes: 5766476: Prototype implementation confidence from observer | https://chromium-review.googlesource.com/c/chromium/src/+/5766476

Need to think about exact API names but this is the direction I was thinking about, yes.

@yoavweiss
Copy link
Contributor

The downside of having this only be available to performance observers is that it'd be impossible to collect this data for navigations that never make it to their load event.

@noamr
Copy link
Contributor

noamr commented Aug 8, 2024

The downside of having this only be available to performance observers is that it'd be impossible to collect this data for navigations that never make it to their load event.

Can the confidence level change between receiving the response headers and the load event?

Also, is this planned to be exposed in iframes?

@mwjacksonmsft
Copy link
Author

Can the confidence level change between receiving the response headers and the load event?

It seems unlikely, but I'm not sure. The use case for our customers is for navigation that occurs during a user agent cold launch. We've discussed future potential factors such as extensions impact, or other system resource considerations (e.g. high cpu usage). https://github.com/w3c/web-performance/wiki/Nice-things-we-can%27t-have

Also, is this planned to be exposed in iframes?

I don't have a preference if this is exposed within iframes or not. For SystemEntropy, we did decide it shouldn't be exposed within iframes, but that was without any privacy protections.

@mwjacksonmsft
Copy link
Author

Can the confidence level change between receiving the response headers and the load event?

It seems unlikely, but I'm not sure. The use case for our customers is for navigation that occurs during a user agent cold launch. We've discussed future potential factors such as extensions impact, or other system resource considerations (e.g. high cpu usage). https://github.com/w3c/web-performance/wiki/Nice-things-we-can%27t-have

@noamr Upon re-reviewing the data we collected, the data suggests that most of the randomness that occurred was between navigationStart and responseEnd. There didn't appear to be much variation after domLoading. However, the caveat to that is this data was narrowly collected for user agent launch scenarios.

@csharrison
Copy link

I have one more small suggestion for this proposal regarding how developers deal with noise.
There is a bit of a "foot gun" with respect to how developers split + aggregate data based on the noisy confidence bit based on the fact that the noise mechanism has bias. In the slides I presented to the group I have a formula to debias an aggregate, but it requires keeping around the epsilon parameter alongside records, and is an additional somewhat non-trivial server-side step.

One simplification for developers would be for the platform to debias each report individually. This could look like exposing both a noisy confidence field, along with an unbiased_high_confidence_count with each navigation. Here is how it would work:

Imagine we have an epsilon = ln(3) as our privacy parameter.

  • If the return value from the randomized response is low, let unbiased_high_confidence_count = -.5
  • If the return value is high, let unbiased_high_confidence_count = 1.5

These numbers come from the formula $f(x) = \frac{x - p/2}{1-p}$ where $x$ is 1 for high confidence and 0 otherwise.

If you have a slice of records and you want to count how many are high confidence, you can just sum up each record's unbiased count without doing any other math and you will get an unbiased estimate of the total. For a histogram breakout, let the mass of each record in the histogram be its unbiased count, etc. etc.

@noamr
Copy link
Contributor

noamr commented Aug 30, 2024

Can the confidence level change between receiving the response headers and the load event?

It seems unlikely, but I'm not sure. The use case for our customers is for navigation that occurs during a user agent cold launch. We've discussed future potential factors such as extensions impact, or other system resource considerations (e.g. high cpu usage). https://github.com/w3c/web-performance/wiki/Nice-things-we-can%27t-have

@noamr Upon re-reviewing the data we collected, the data suggests that most of the randomness that occurred was between navigationStart and responseEnd. There didn't appear to be much variation after domLoading. However, the caveat to that is this data was narrowly collected for user agent launch scenarios.

OK, I guess what we know that feeds into confidence (e.g. cold start) is known when the document is created. Still, I think we should figure out if this feature should be available in iframes, to avoid a situation where multiple iframes are created to try to track changes in confidence continuously to reduce the noise.

@mwjacksonmsft
Copy link
Author

OK, I guess what we know that feeds into confidence (e.g. cold start) is known when the document is created. Still, I think we should figure out if this feature should be available in iframes, to avoid a situation where multiple iframes are created to try to track changes in confidence continuously to reduce the noise.

I'm comfortable with returning null for the confidence attribute within iframes. I don't believe in its current form this could be used to track changes in confidence in real time. I'd prefer to start with a scoped change and expand once we've had a chance to assess any potential additional privacy risk. WDYT?

aarongable pushed a commit to chromium/chromium that referenced this issue Oct 18, 2024
To enable developers to discern if the navigation timings are
representative for their web application, the change adds a new ‘confidence’ field to the PerformanceNavigationTiming struct.
The confidence field should not be populated until the confidence value
is finalized.

This change contains the blink changes to expose the new API, as well as
the infrastructure for the cross-process communication. This change
always returns a value of 'high' confidence for top level navigations,
and returns null for iframes. In future changes noise will be added
via a differential privacy algorithm.

The finalized confidence is stored in the DocumentLoadTiming class,
and is sent by the RenderFrameHostImpl in response to a notification
that the document is now 'interactive'.

Changes to add usecounters will be added after this change lands.

Explainer: https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/PerformanceNavigationTiming%20for%20User%20Agent%20Launch/explainer.md
Chrome Status: https://chromestatus.com/feature/5186950448283648
Dev Design: https://docs.google.com/document/d/1D6DqptsCEd3wPRsZ0q1iwVBAXXmhxZuLV-KKFI0ptCg/edit?usp=sharing
I2P: https://groups.google.com/a/chromium.org/g/blink-dev/c/o0F7nBKsgg0/m/bJSp3ekfAAAJ
W3C Issue: w3c/navigation-timing#202

Manual testing steps:

1) Start the browser with these command line parameters:
   --enable-features=PerformanceNavigationTimingConfidence https://example.com
2) Open the developer tools, and switch to the Console tool.
3) Run "window.performance.getEntriesByType('navigation')[0].confidence"
   into the console, and you should see 'high' returned.

Bug: 1413848
Change-Id: I9590c6a3899aa756af6abc6d6c1a7c2b88bde439
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5906123
Auto-Submit: Mike Jackson <[email protected]>
Commit-Queue: Mike Jackson <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1370718}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants