Output uniformly random guess at frequency distribution when reach is too small #1498

riemanli · 2024-02-21T23:03:08Z

The variance calculation of frequency distribution will output NaN when reach is zero. Moreover, the estimated variance is not accurate when reach is impractically small. The solution is to check whether the reach is too small using its confidence interval. If the confidence interval of the reach contains values <= 0, we claim the reach is too small for an accurate variance estimate of frequency distribution, and output the variance of uniformly random draw from [0, 1].

wfa-reviewable · 2024-02-21T23:03:15Z

This change is

riemanli · 2024-02-21T23:03:22Z

Current dependencies on/for this PR:

Output uniformly random guess at frequency distribution when reach is too small #1498 👈
main

This stack of pull requests is managed by Graphite.

jiayu-google

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @chenweiw and @riemanli)

src/main/kotlin/org/wfanet/measurement/measurementconsumer/stats/LiquidLegions.kt line 275 at r1 (raw file):

    // When reach is too small, we have little info to estimate frequency, and thus the estimate of
    // relative frequency is equivalent to a uniformly random guess at probability.

nit: "uniformly random guess of a probability in [0, 1]."

chenweiw

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @riemanli)

src/main/kotlin/org/wfanet/measurement/measurementconsumer/stats/MeasurementStatistics.kt line 91 at r1 (raw file):

 * A reach result is considered too small when computing variances of relative frequency if the 95%
 * confidence interval of the reach covers 0 or negative values. The 95% confidence interval =
 * reach_result +/- 1.96 * reach_std.

Do we always consider "95%" confidence interval here or is "95%" defined by the customer's input? If it is the former, then nothing needs to be changed.

riemanli

Reviewable status: 2 of 4 files reviewed, 1 unresolved discussion (waiting on @chenweiw and @stevenwarejones)

src/main/kotlin/org/wfanet/measurement/measurementconsumer/stats/LiquidLegions.kt line 275 at r1 (raw file):

Previously, jiayu-google wrote…

nit: "uniformly random guess of a probability in [0, 1]."

Done.

src/main/kotlin/org/wfanet/measurement/measurementconsumer/stats/MeasurementStatistics.kt line 91 at r1 (raw file):

Previously, chenweiw wrote…

Do we always consider "95%" confidence interval here or is "95%" defined by the customer's input? If it is the former, then nothing needs to be changed.

It's hardcoded right now. Not sure if it makes sense to let user control this.

chenweiw

Reviewable status: 2 of 4 files reviewed, all discussions resolved (waiting on @stevenwarejones)

chenweiw

Reviewable status: 2 of 4 files reviewed, all discussions resolved (waiting on @stevenwarejones)

stevenwarejones

Reviewed 2 of 4 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @riemanli)

… too small

… too small (#1498) The variance calculation of frequency distribution will output `NaN` when reach is zero. Moreover, the estimated variance is not accurate when reach is impractically small. The solution is to check whether the reach is too small using its confidence interval. If the confidence interval of the reach contains values <= 0, we claim the reach is too small for an accurate variance estimate of frequency distribution, and output the variance of uniformly random draw from [0, 1].

riemanli requested review from chenweiw and jiayu-google February 21, 2024 23:08

jiayu-google approved these changes Feb 22, 2024

View reviewed changes

chenweiw approved these changes Feb 23, 2024

View reviewed changes

riemanli requested a review from stevenwarejones February 23, 2024 03:57

riemanli force-pushed the riemanli_fix_frequency_measurement_variance_when_reach_is_too_small branch from 0963363 to 675b83b Compare February 23, 2024 03:59

riemanli commented Feb 23, 2024

View reviewed changes

chenweiw approved these changes Feb 23, 2024

View reviewed changes

stevenwarejones approved these changes Feb 26, 2024

View reviewed changes

Output uniformly random guess at frequency distribution when reach is…

30e69c9

… too small

riemanli force-pushed the riemanli_fix_frequency_measurement_variance_when_reach_is_too_small branch from 675b83b to 30e69c9 Compare February 26, 2024 22:57

riemanli enabled auto-merge (squash) February 26, 2024 22:58

riemanli merged commit 05abea3 into main Feb 26, 2024
5 of 6 checks passed

riemanli deleted the riemanli_fix_frequency_measurement_variance_when_reach_is_too_small branch February 26, 2024 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output uniformly random guess at frequency distribution when reach is too small #1498

Output uniformly random guess at frequency distribution when reach is too small #1498

riemanli commented Feb 21, 2024 •

edited

Loading

wfa-reviewable commented Feb 21, 2024

riemanli commented Feb 21, 2024

jiayu-google left a comment

chenweiw left a comment

riemanli left a comment

chenweiw left a comment

chenweiw left a comment

stevenwarejones left a comment

Output uniformly random guess at frequency distribution when reach is too small #1498

Output uniformly random guess at frequency distribution when reach is too small #1498

Conversation

riemanli commented Feb 21, 2024 • edited Loading

wfa-reviewable commented Feb 21, 2024

riemanli commented Feb 21, 2024

jiayu-google left a comment

Choose a reason for hiding this comment

chenweiw left a comment

Choose a reason for hiding this comment

riemanli left a comment

Choose a reason for hiding this comment

chenweiw left a comment

Choose a reason for hiding this comment

chenweiw left a comment

Choose a reason for hiding this comment

stevenwarejones left a comment

Choose a reason for hiding this comment

riemanli commented Feb 21, 2024 •

edited

Loading