Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector summation DP aggregation #264

Open
dvadym opened this issue Apr 18, 2022 · 7 comments
Open

Vector summation DP aggregation #264

dvadym opened this issue Apr 18, 2022 · 7 comments
Assignees
Labels
Type: New Feature ➕ Introduction of a completely new addition to the codebase

Comments

@dvadym
Copy link
Collaborator

dvadym commented Apr 18, 2022

Context

DPEngine.aggregate performs DP aggregations of scalar values (sum, count, mean etc). A set of computed metrics is controlled with metrics field of aggregate_params argument.
The result of this function is a collection of (partition_key, named_tuple_with_requested_metrics)

Note: More details on the terminology is here.

Goals

Support of vector_sum in DPEngine.aggregate

The goal is Implement full support of vector_sum in DPEngine.aggregate, i.e. the values to aggregate are arrays of the same size, and output is (partition_key, named_tuple["array_sum": sum_of_vectors_per_partition_key]).

References:

  1. All metrics are aggregated with combiners (e.g. SumCombiner )
  2. There is already a low level function that applies Laplace/Gaussian mechanism to np arrays

This task can be slit in 2 parts:

  1. Implementing VectorSumCombiner, which performs aggregation
  2. Plumb VectorSumCombiner into DPEngine.aggregate

Expose vector_sum computation to high-level Beam and Spark APIs.

High-level Beam and Spark APIs are represented by PrivatePCollection and PrivateRDD classes and transformations on them. All DP computations are performed in DPEngine.
PrivatePCollection and PrivateRDD keeps data in internal collection (PCollection or RDD correspondingly). They provide a guarantee, that only data that has been aggregated in a DP manner, using no more than the specified privacy budget can be extracted.
Private Beam and Private Spark transformation are wrappers around DPEngine.aggregate. There are transformation for COUNT, MEAN etc.

variance transformation can be used as a good example:

  1. Beam implementation, tests.
  2. Spark implementation, tests
@dvadym dvadym added the Type: New Feature ➕ Introduction of a completely new addition to the codebase label Apr 18, 2022
@rialg
Copy link
Contributor

rialg commented May 10, 2022

I can take a look at this one

@dvadym
Copy link
Collaborator Author

dvadym commented May 10, 2022

Sure, go ahead! Thanks!

@rialg
Copy link
Contributor

rialg commented May 12, 2022

IIUC, the VectorSumCombiner class will be similar to SumCombiner, but the AccumulatorType = np.ndarray. Is this correct?

@dvadym
Copy link
Collaborator Author

dvadym commented May 12, 2022

Yes, correct

@rialg
Copy link
Contributor

rialg commented May 12, 2022

In order to use add_noise_vector, an object of AdditiveVectorNoiseParams needs to be created. AFAIK, CombinerParams should contain the attributes needed to populate AdditiveVectorNoiseParams. Would it make sense to extend AggregateParams with the missing fields for the vector noise?

For instance:

    max_norm: float
    l0_sensitivity: float
    linf_sensitivity: float
    norm_kind: pipeline_dp.aggregate_params.NormKind

@dvadym
Copy link
Collaborator Author

dvadym commented May 13, 2022

Good question, we need to introduce max_norm and norm_kind in AggregateParams.

l0_sensitivity = max_partitions_contributed
linf_sensitivity = max_contributions_per_partition

@rialg
Copy link
Contributor

rialg commented May 13, 2022

I'm trying to include VectorSumCombiner in DPEngine.aggregate. But, I would need to understand whether it should be used with the CompoundCombiners class. Should this case be considered as a separete branch in create_compound_combiner, similar to what happens with the metric pipeline_dp.Metrics.PRIVACY_ID_COUNT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: New Feature ➕ Introduction of a completely new addition to the codebase
Projects
None yet
Development

No branches or pull requests

2 participants