Decision Trees with Differential Privacy #355
Labels
gsoc
Google Summer of Code Project
Type: Epic 🤙
Describes a large amount of functionality that will likely be broken down into smaller issues
Type: Research 🔬
When further investigation into a subject is required
Feature Description
This feature will eventually add Decision Trees into the PyDP library, along with the necessary control mechanism needed to use it as part of a pipeline.
Is your feature request related to a problem?
No, it is due to a lack of support.
What alternatives have you considered?
No alternatives currently exist.
Are you interested in working on this yourself?
Yes.
Additional Context
Given that scope of this issue is widespread, it will eventually be broken down into smaller issues. Here's an outline of the functionalities to be implemented:
Base Decision Tree Model
The base Decision Tree model: a vanilla ID3 based decision tree.
ID3 construction algorithm
Differentially Private algorithm: to construct and update the tree in a differentially private way.
References:
Differentially Private Bagging
An algorithm to partition data multiple times in order to achieve differentially private subsample-and-aggregate.
References:
Support for Horizontally and Vertically Partitioned Data
This will extend the functionality of the Private Decision Tree to be able to work with horizontally distributed data (by means of incremental learning) and vertically distributed data.
Vertically distributed data requires each stakeholder to construct a differentially private decision tree (assuming no overlap in the attributes) and makes a union.
References:
Boosted Differential Private ensembles
Add boosting to differentially private decisions trees to enable ensemble formation.
References:
More details can be found here as well as in the reference papers.
The text was updated successfully, but these errors were encountered: