Feature/assoc demo: New algorithm to perform association analysis using 2 way marginal information and supporting changes. #48

ananthr · 2015-07-27T21:48:23Z

analysis/R/decode2way.R: new algorithm and helper functions are here
analysis/tools/sum_bits_assoc.py: python client that computes marginal inputs required for decode2way algorithms given RAPPOR inputs from stdin
Usage: sum_bits_assoc.py
Corresponding test analysis/tools/sum_bits_assoc_test.py tests the python file with detailed documentation of output format
assoctest.sh: a test suite to run end-to-end simulations and compare EM and new algorithms.
Usage: ./assoctest.sh run-seq ‘^a-’ 5 T
This runs all association tests 5 times comparing both algorithms (T = true flag) in sequential processes. Other options documented in assoctest.sh
assoctest.sh requires parameters from tests/assoctest_spec.py and HTML plumbing in tests/make_assoc_summary.py and tests/assoctest.html
compare_assoc.R: the main R code that is used by assoctest
experimental/assoc contains some old files used in experimenting with the new assoc analysis code
quick_assoc.sh: a simple wrapper around R functions to run both new and old assoc algorithms on input map files, rappor reports, and params.
Usage: ./quick_assoc.sh [<EM also? T/F>].
Note: directory assumed to have some structure (see documentation in quick_assoc.sh)
setup.sh: updated to include new jsonlite library
tests/gen_true_values_assoc{_test}.R: R files that implement and test respectivley generating distributions (correlated zipfians) analogous to tests/gen_true_values.R for RAPPOR histograms
tests/rappor_assoc_sim.py: tests/rappor_sim.py modified to process two variables at a time from a true values file for the purposes of assoctest.sh end-to-end simulations.

positional and named arguments. Add some logging. Add parameter to shell function to remove actual strings from candidates.

- Use Rscript in PATH when running tests

Added TODO.

Current metrics calculated are l1 and l2 norms and a heuristic whether there was a false positive detected (when a spurious candidate string is reported in rappor estimate that doesn't exist in the real distribution).

ProcessAll in analyze.R now returns metrics.

The test parameters are defined in tests/regtest_spec.py. Basic usage is: $ ./regtest.sh run-all This runs all tests in parallel, and results in an HTML table with results. - Calculate both false positives and false negatives in analyze.R. Refactor the function to be more symmetric. - Refactor demo.sh a bit - In gen_sim_input.py, get rid of hard-coded 7 values per client, and make it a parameter - Change rappor_sim.py to use a -d <dist> flag, rather than separate flags - Add test cases based on Chrome params - Factor out util.sh script

- Rename to 'uniform' (probability 1/2) and 'f_mask' (probability f). Rewrite the comments.

Make the Python simulation into a Python client library.

- resolved conflicts - modified code to use new Encode interface - modified rappor_assoc_sim.py to use same interface as rappor_sim.py

Also, some minor refactoring.

- uncommented experimental code in decode2way and documented it - renamed function that processes assoc maps - deleted params.csv

- inverted noise matrix outside loop - renamed gen_assoc_reports - added its test to test.sh - make-summary now shows original dimensions for variables

- threw fitdistribution experimental code into separate function that is now only called by a flag passed to FitDistribution - flag added to assoctest.sh to run comparisons to EM - added package jsonlite to setup - further documentation added in sum_bits_assoc

andychu and others added 30 commits March 5, 2015 18:47

Name it analysis_tool, so it's clear that it's a command line tool.

6cf77bf

Integrated the pcls code. Runs on our Chrome test case.

0490f2c

Make all the calls to AnalyzeRAPPOR consistent with respect to

09dd7f7

positional and named arguments. Add some logging. Add parameter to shell function to remove actual strings from candidates.

Add test case to show the problem.

c18d309

Use pcls from shell

971ca2e

- Add setup script to fully document R dependencies.

f2f3fa4

- Use Rscript in PATH when running tests

Forgot to add alternative file

a2796bd

Merge branch 'master' of github.com:google/rappor

100ea2b

Added TODO.

bc60e66

Merge pull request #25 from ananthr/master

0b3f02b

Added TODO.

Adding distribution metrics.

5e840c3

ProcessAll in analyze.R now returns metrics.

4b776d9

Current metrics calculated are l1 and l2 norms and a heuristic whether there was a false positive detected (when a spurious candidate string is reported in rappor estimate that doesn't exist in the real distribution).

Cleaned up workflow for false positives.

49dd888

False positives now include rappor est. proportions.

8b1c029

Addressed some comments.

d0f38fe

Addressing comments.

8077d0d

L1 distance using merge.

ffcb437

Small typo.

525d5ba

Merge pull request #26 from ananthr/master

dfbb426

ProcessAll in analyze.R now returns metrics.

Add links to both papers

b9b413c

Fix typo in column label

4e3ac86

VALUES_PER_CLIENT defaults to 1

fedafc9

- Fix bug introduced by previous swap of f_bits and mask_indices

d933c77

- Rename to 'uniform' (probability 1/2) and 'f_mask' (probability f). Rewrite the comments.

Clarify assertion

af928ab

Fix lint errors

3d76af2

Fix incorrect fix of lint error

4b17a7c

Make lint happy

9a790b4

Add URL so it's easier to open in the browser.

05d3eb2

Fix off by one error noticed by Ilya

2075061

andychu and others added 19 commits July 16, 2015 14:10

Style fixes

5fdbfac

Merge pull request #45 from google/feature/py-cpp-client-2

c2a50f8

Make the Python simulation into a Python client library.

Merge branch 'master' into feature/assoc-demo

8546101

Incorporating changes from master

f73aac4

- resolved conflicts - modified code to use new Encode interface - modified rappor_assoc_sim.py to use same interface as rappor_sim.py

Added a test for gen_assoc_reports.R

ab78319

Also, some minor refactoring.

Replaced regtest_spec.py from master branch.

feee5d8

Moving deprecated code to experimental directory

7936fc9

A few fixes from code review.

3deceee

- uncommented experimental code in decode2way and documented it - renamed function that processes assoc maps - deleted params.csv

Addressing more review comments.

c0ea8cf

- inverted noise matrix outside loop - renamed gen_assoc_reports - added its test to test.sh - make-summary now shows original dimensions for variables

Adding sum_bits_assoc_test and fixing small error in assoctest.sh

870ee04

Adding sum_bits_assoc_test.py

964f8a9

Added a couple more tests to sum_bits_assoc_test

a4accc9

Adding compare_assoc.R instead of analyze_assoc_expt.R

e66ffd1

Code review changes

75120b9

- threw fitdistribution experimental code into separate function that is now only called by a flag passed to FitDistribution - flag added to assoctest.sh to run comparisons to EM - added package jsonlite to setup - further documentation added in sum_bits_assoc

Remove display of compare flag in results.

92590b8

Reconciled with old decode.R for assoc pruning.

2108061

Fixed expected_f_2way in sum bits assoc test

22fa769

Wrapper for running quick analysis.

2174172

Clean up in assoctest.sh

bda7275

ananthr assigned andychu and ilyamironov and unassigned andychu and ilyamironov Jul 27, 2015

ananthr added 3 commits August 4, 2015 21:00

Modifications to work with basic assocations.

5e665da

Rigging old EM code to work with Basic assoc.

45052b9

params causes a bug

bde82f4

tkaitchuck closed this Feb 5, 2016

tkaitchuck force-pushed the master branch from 6eb6ea1 to c195445 Compare February 5, 2016 21:01

tkaitchuck mentioned this pull request Feb 5, 2016

Feature/assoc demo #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/assoc demo: New algorithm to perform association analysis using 2 way marginal information and supporting changes. #48

Feature/assoc demo: New algorithm to perform association analysis using 2 way marginal information and supporting changes. #48

ananthr commented Jul 27, 2015

Feature/assoc demo: New algorithm to perform association analysis using 2 way marginal information and supporting changes. #48

Feature/assoc demo: New algorithm to perform association analysis using 2 way marginal information and supporting changes. #48

Conversation

ananthr commented Jul 27, 2015