From 2be048e87da0b605d77e8960a67090ebd177bc39 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 7 Nov 2019 23:05:28 -0800 Subject: [PATCH 1/2] Update README with examples and godoc link --- README.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/README.md b/README.md index 1cdb63c..82cc1c4 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,7 @@ +[![Docs](https://godoc.org/github.com/lightstep/varopt?status.svg)](https://godoc.org/github.com/lightstep/varopt) + +# VarOpt Sampling Algorithm + This is an implementation of VarOpt, an unbiased weighted sampling algorithm described in the paper [Stream sampling for variance-optimal estimation of subset sums](https://arxiv.org/pdf/0803.0473.pdf) (2008) @@ -12,3 +16,35 @@ often useful in conjunction with weighed reservoir sampling, using Algorithm R from [Random sampling with a reservoir](https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_R) (1985) by Jeffrey Vitter. + +# Usage: Natural Weights + +A typical use of VarOpt sampling is to estimate network flows using +sample packets. In this use-case, the weight applied to each sample +is the size of the packet. Beacuse VarOpt computes an unbiased +sample, the sample data points can be summarized along secondary +dimensions. For example, we can select a subset of the sample +according to a secondary attribute, sum the sample weights, and the +result is expected value of the secondary attribute in the original +population. + +See [weighted_test.go](https://github.com/lightstep/varopt/blob/master/weighted_test.go) for an example. + +# Usage: Inverse-probability Weights + +Another use for VarOpt sampling uses inverse-probability weights to +estimate frequencies while simultaneously controlling sample +diversity. Suppose a sequence of observations can be naturally +categorized into N different buckets. The goal in this case is to +compute a sample where each bucket is well represented, while +maintaining frequency estimates. + +In this use-case, the weight assigned to each observation is the +inverse probability of the bucket it belongs to. The result of +weighted sampling with inverse-probability weights is a uniform +expectation, in this example we expect an equal number of observations +falling into each bucket. Each observation represents a frequency of +its sample weight (computed by VarOpt) divided by its original weight +(the inverse-probability). + +See [frequency_test.go](https://github.com/lightstep/varopt/blob/master/frequency_test.go) for an example. From 08e548be48e5ad1232d9e340f17e96da3bc660c1 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 7 Nov 2019 23:08:10 -0800 Subject: [PATCH 2/2] Update --- README.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 82cc1c4..ba73609 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ Algorithm R from [Random sampling with a reservoir](https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_R) (1985) by Jeffrey Vitter. -# Usage: Natural Weights +## Usage: Natural Weights A typical use of VarOpt sampling is to estimate network flows using sample packets. In this use-case, the weight applied to each sample @@ -30,7 +30,7 @@ population. See [weighted_test.go](https://github.com/lightstep/varopt/blob/master/weighted_test.go) for an example. -# Usage: Inverse-probability Weights +## Usage: Inverse-probability Weights Another use for VarOpt sampling uses inverse-probability weights to estimate frequencies while simultaneously controlling sample @@ -48,3 +48,11 @@ its sample weight (computed by VarOpt) divided by its original weight (the inverse-probability). See [frequency_test.go](https://github.com/lightstep/varopt/blob/master/frequency_test.go) for an example. + +## Usage: Merging Samples + +VarOpt supports merging independently collected samples one +observation at a time. This is useful for building distributed +sampling schemes. In this use-case, each node in a distributed system +computes a weighted sample. To combine samples, simply input all the +observations and their corresponding weights into a new VarOpt sample. \ No newline at end of file