Awesome Privacy Engineering

A curated list of resources related to privacy engineering

Content

Courses
Books
Data Deletion, Data Mapping, and Data Subject Access Requests
Privacy Tech Series
Privacy Threat Modeling
Machine Learning and Algorithmic Bias
Facial Recognition
De-Identification and Anonymization
Homomorphic Encryption
Tokenization
Secure Multi-Party Computation
Synthetic Data
Differential Privacy and Federated Learning
Designing for Trust with Users
Deceptive Design Patterns
Tagging Personally Identifiable Information
Regulatory and Framework Resources
Conferences
Career
Miscelleaneous
Other Awesome Privacy Curations
Related Github Topics

Courses

OpenMined Courses
- Our Privacy Opportunity (Beginner) (7.7 hours)
- Introduction to Remote Data Science (Intermediate) (8 hours)
- Foundations of Private Computation (Intermediate) (60 hours)
- Federated Learning on Mobile (Intermediate) (40 hours) (Not Yet Released)
Data Privacy and Anonymization in Python - Datacamp course on learning to process sensitive information with privacy-preserving techniques.
Secure and Private AI (Udacity) - Udacity course that covers how to extend PyTorch with the tools necessary to train AI models that preserve user privacy.
Practical Data Ethics - This class was originally taught in-person at the University of San Francisco Data Institute in January-February 2020.
Privacy-Conscious Computer Systems - This class at Brown University (CSCI 2390) focuses on how to design computer systems that protect users' privacy.
Privacy by Design: Data Classification - LinkedIn Learning course by Nishant Bhajaria.
Privacy by Design: Data Sharing - LinkedIn Learning course by Nishant Bhajaria.
Implementing a Privacy, Risk, and Assurance Program - LinkedIn Learning course by Nishant Bhajaria.
Data Protocol - Courses to teach developers and technical professionals how to build products responsibly and partner with platforms effectively.
Carnegie Mellon University - Privacy Engineering Certificate - Four-week certificate program that revolves around a combination of mini-tutorials, class discussions, and hands-on exercises designed to ensure that students develop practical knowledge of all key privacy engineering areas.
Technical Privacy Masterclass - In four modules, this course from Privado is designed to deliver privacy leaders and their teams with an overview of the pillars of a proactive privacy program.
Compliance Detective - A gamified approach to learning about privacy engineering, Compliance Detective (formerly Privacy Quest) uses challenges and competitions to build your privacy and security knowledge.
Hitchhiker's Guide to Privacy Engineering - The goal of this creative privacy project is to offer a fun, engaging, and immersive privacy learning experience for privacy lawyers to improve their technical privacy skills.

Books

The Privacy Engineer's Manifesto: Getting from Policy to Code to QA to Value (Michelle Dennedy, Jonathan Fox, Tom Finneran)
Information Privacy Engineering and Privacy by Design: Understanding Privacy Threats, Technology, and Regulations Based on Standards and Best Practices (William Stallings)
The Algorithmic Foundation of Differential Privacy (Cynthia Dwork, Aaron Roth)
Building an Anonymization Pipeline: Creating Safe Data (Luk Arbuckle, Khaled El Emam)
Strategic Privacy by Design (R. Jason Cronk)
The Architecture of Privacy: On Engineering Technologies that Can Deliver Trustworthy Safeguards (Courtney Bowman, Ari Gesher, John K. Grant, Daniel Slate, Elissa Lerner)
Data Privacy: A Runbook for Engineers (Nishant Bhajaria)
Privacy Design Strategies (The Little Blue Book) (Jaap-Henk Hoepman)
Privacy Is Hard and Seven Other Myths: Achieving Privacy through Careful Design (Jaap-Henk Hoepman)
Privacy Engineering: A Dataflow and Ontological Approach (Ian Oliver)
Practical Data Privacy (Katharine Jarmul) (and accompanying Jupyter notebooks)
Threat Modeling: Designing for Security (Adam Shostack)
Threat Modeling: A Practical Guide for Development Teams (Izar Tarandach, Matthew J. Coles)

Data Deletion, Data Mapping, and Data Subject Access Requests

Deleting Data Distributed Throughout Your Microservices Architecture - Microservices architectures tend to distribute responsibility for data throughout an organization. This poses challenges to ensuring that data is deleted.
Handling Data Erasure Requests in Your Data Lake with Amazon S3 Find and Forget - Amazon S3 Find and Forget enables you to find and delete records automatically in data lakes on Amazon S3.
- Amazon S3 Find and Forget
How to Delete User Data in an AWS Data Lake - This post walks through a framework that helps you purge individual user data within your organization’s AWS hosted data lake, and an analytics solution that uses different AWS storage layers, along with sample code targeting Amazon S3.
- Data Purging AWS Data Lake
Best Practices: GDPR and CCPA Compliance Using Delta Lake - Article that describes how to use Delta Lake on Databricks to manage General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) compliance for a data lake.
Klaro! - Klaro is a simple consent management platform (CMP) and privacy tool that helps you to be transparent about the third-party applications on your website.
OpenDSR - A common framework enabling companies to work together to protect consumers' privacy and data rights (formerly known as OpenGDPR.)
PrivacyBot - PrivacyBot is a simple automated service to initiate CCPA deletion requests with data brokers. (deprecated)
Cookie Consent - An opensource, lightweight JavaScript plugin for alerting users about the use of cookies on a website. It is designed to help quickly comply with the European Union Cookie Law, CCPA, GDPR and other privacy laws.
Fides - An open-source tool that allows you to easily declare your systems' privacy characteristics, track privacy related changes to systems and data in version control, and enforce policies in both your source code and your runtime infrastructure.
- Fideslang - Open-source description language for privacy to declare data types and data behaviors in your tech stack in order to simplify data privacy globally. Supports GDPR, CCPA, LGPD and ISO 19944.
- Fidesops - DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests. (deprecated)
Privado - Privado is an open source static code analysis tool to discover data flows in the code. It detects the personal data being processed, and further maps the journey of the data from the point of collection to going to interesting sinks such as third parties, databases, logs, and internal APIs.
Detecting PII Using Amazon Comprehend - Using Amazon Comprehend to detect entities that contain personally identifiable information (PII) in a text document.
Octopii - Octopii is an open-source AI-powered PII scanner that can look for image assets such as Government IDs, passports, photos and signatures in a directory.
Data Profiler - DataProfiler is a Python library created by Capital One to make data analysis, monitoring, and sensitive data detection easy.
PII Catcher - Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub.

Privacy Tech Series by Lea Kissner

Interface Design: The Who/What/Where Rule
Vulnerability versus Incident
Deidentification versus Anonymization
Aggregating Over Anonymized Data
Thinking Through ACL-Aware Data Processing
Settings and Surfaces
Comprehensible Access Control Lists
Data Retention in a Distributed System
Setting Data Retention Timelines
Handling Human Names

Privacy Threat Modeling

LINDDUN - The LINDDUN privacy engineering framework provides systematic support for the elicitation and mitigation of privacy threats in software systems.
LINDDUN GO - LINDDUN GO is designed to give you a quick start to privacy threat modeling.
PLOT4AI - Privacy Library Of Threats 4 Artificial Intelligence (PLOT4AI) is a threat modeling library to help practitioners build responsible artificial intelligence.
Draw.io Libraries for Threat Modeling - Collection of custom libraries for using the Draw.io diagramming application for threat modeling.
xCompass - A privacy threat modeling persona framework that developers can use to test and document privacy threats, and find edge cases of privacy harm (formerly named Models of Applied Privacy (MAP)).
Privacy Adversarial Framework (PAF) - Developed by Facebook, the Privacy Adversarial Framework (PAF) is a knowledgebase of privacy-focused adversarial tactics and techniques that is heavily inspired by MITRE ATT&CK®.
PANOPTIC™ Privacy Threat Model - MITRE PANOPTIC™, the Pattern and Action Nomenclature Of Privacy Threats In Context, is a privacy threat taxonomy for breaking down and describing privacy attacks against individuals and groups of individuals.

Machine Learning and Algorithmic Bias

Aequitas - An open source bias audit toolkit developed by the Center for Data Science and Public Policy at University of Chicago, can be used to audit the predictions of machine learning based risk assessment tools to understand different types of biases, and make informed decisions about developing and deploying such systems.
Ethical Machine Learning - Spotting and Preventing Proxy Bias - Jupyter Notebook from rOpenSciLabs that explores several ways of detecting unintentional bias and removing it from a predictive model. (deprecated)
Fairness in Machine Learning Engineering - Google's Machine Learning Crash Course includes a 70-minute section on fairness.
How to Incorporate Ethics and Risk into Your Machine Learning Development Process - To help highlight ethics and risk in machine learning, this article looks at the six steps involved in developing an ML system, what happens in each step, and the risk and ethics questions that arise.
DrivenData: Deon - A command line tool to easily add an ethics checklist to your data science projects.
People + AI Guidebook - A friendly, practical guide that lays out some best practices for creating useful, responsible AI applications.
- Why Some Models Leak Data - Machine learning models use large amounts of data, some of which can be sensitive. If they're not trained correctly, sometimes that data is inadvertently revealed.
- Datasets Have Worldviews - Every dataset communicates a different perspective. When you shift your perspective, your conclusions can shift, too.
- Measuring Fairness - How do you make sure a model works equally well for different groups of people?
- How Randomized Response Can Help Collect Sensitive Information Responsibly - Giant datasets are revealing new patterns in cancer, income inequality and other important areas. However, the widespread availability of fast computers that can cross reference public data is making it harder to collect private information without inadvertently violating people's privacy. Modern randomization techniques can help preserve anonymity.
- Can a Model Be Differentially Private and Fair? - Training with differential privacy limits the information about any one data point that is extractable but in some cases there’s an unexpected side-effect: reduced accuracy with underrepresented subgroups disparately impacted.
- Hidden Bias - Models trained on real-world data can encode real-world bias. Hiding information about protected classes doesn't always fix things — sometimes it can even hurt.
- How Federated Learning Protects Privacy - With federated learning, it’s possible to collaboratively train a model with data from multiple users without any raw data leaving their devices.
Fairlearn - A Python package to assess and improve fairness of machine learning models.
InterpretML - A toolkit to help understand models and enable responsible machine learning.
ML Privacy Meter - A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks
Privacy Considerations in Large Language Models - The potential for models to leak details from the data on which they’re trained may be a concern for all large language models, and additional issues may arise if a model trained on private data were to be made publicly available.
Explaining Decisions Made with AI - Guidance by the UK's Information Commissioner's Office (ICO) and The Alan Turing Institute aims to give organisations practical advice to help explain the processes, services and decisions delivered or assisted by AI, to the individuals affected by them.
Responsible AI Toolbox - Responsible AI Toolbox is a suite of tools from Microsoft that provides a collection of model and data exploration and assessment user interfaces that enable a better understanding of AI systems. The Toolbox consists of four dashboards: an Error Analysis dashboard, an Interpretability dashboard, a Fairness dashboard, and a Responsible AI dashboard.
Of Oaths and Checklists - A checklist for people who are working on data projects, authored by DJ Patil, Hilary Mason, and Mike Loukides.
Intro to AI Ethics - A Kaggle Learn course to explore practical tools to guide the moral design of AI systems.
Failure Modes in Machine Learning - Documentation compiled by Microsoft regarding the different ways that machine learning can fail, both intentionally (through adversarial attack) and unintentionally (formally correct but completely unsafe outcome).
Apple Privacy-Preserving Machine Learning Workshop 2022 - In June 2022, Apple hosted the Workshop on Privacy-Preserving Machine Learning (PPML), which brought Apple and members of the academic research communities together to discuss the state of the art in the field of privacy-preserving machine learning through a series of talks and discussions. This post includes highlights from workshop discussions and recordings of select workshop talks.
Machine Unlearning - A compilation of existing literature about machine unlearning, a process through which a machine learning model can be made to forget one of its training data points.
Fairness and Machine Learning: Limitations and Opportunities - An online textbook by Solon Barocas, Moritz Hardt, and Arvind Narayanan.
AI Fairness 360 (AIF360) - A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
Model Card Toolkit - Google's Model Card Toolkit streamlines and automates generation of Model Cards, machine learning documents that provide context and transparency into a model's development and performance.
Responsible AI by Design - Microsoft's hub for policies, practices, and tools that make up its framework for Responsible AI by Design. Includes a Responsible AI Standard, Responsible AI Impact Assessment Guide, and Responsible AI Impact Assessment Template.
Private AI Bootcamp - Youtube playlist of lectures from the Private AI Bootcamp at Microsoft Research Redmond in December 2019.
Adversarial Robustness Toolbox (ART) - Python library from the Linux Foundation AI & Data Foundation (LF AI & Data) that enables developers and researchers to defend and evaluate machine learning models and applications against the adversarial threats of evasion, poisoning, extraction, and inference.
Trustworthy ML Initiative - The Trustworthy ML Initiative is a community of researchers and practitioners working on topics related to machine learning models and algorithms that are accurate, explainable, fair, privacy-preserving, causal, and robust.
AI Nutrition Facts Labels - Tool from Twilio that allows generation of AI Nutrition Labels intended to give consumers and businesses a more transparent and clear view into ‘what's in the box’.
Explainable Artificial Intelligence - This course syllabus from Harvard University aims to familiarize students with the recent advances in the emerging field of eXplainable Artificial Intelligence (XAI).
SecretFlow - SecretFlow is a unified framework for privacy-preserving data analysis and machine learning.

Facial Recognition

Understanding Facial Detection, Characterization and Recognition Technologies (Future of Privacy Forum (FPF) Infographic)
Fawkes - Fawkes, privacy preserving tool against facial recognition systems, developed by researchers at SANDLab, University of Chicago.
LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition - Adversarial filter that accounts for the entire image processing pipeline and is demonstrably effective against industrial-grade pipelines that include face detection and large scale databases. Also includes an easy-to-use webtool that significantly degrades the accuracy of Amazon Rekognition and the Microsoft Azure Face Recognition API.
Magritte - Google's Magritte is a MediaPipe-based library to redact faces from photos and videos. It provides processing graphs to reliably detect faces, track their movements in videos, and disguise the person's identity by obfuscating their face.
Creating a Serverless Face Blurring Service for Photos in Amazon S3 - This blog post shows how to build a serverless face blurring service for photos uploaded to an Amazon S3 bucket.

De-Identification and Anonymization

A Visual Guide to Practical Data De-Identification (FPF Infographic)
NIST Privacy Engineering Program - De-Identification Tools
Presidio - Context aware, pluggable and customizable PII anonymization service for text and images, developed by Microsoft.
Redacting Sensitive Information with User-Defined Functions in Amazon Athena - Amazon Athena supports user-defined functions, a feature that enables you to write custom scalar functions and invoke them in SQL queries.
AWS AI-Powered Health Data Masking - The AI-Powered Health Data Masking solution in the AWS Solutions Library helps healthcare organizations identify and mask health data in images or text. (deprecated)
Anonymize Your Data Using Amazon S3 Object Lambda - Leverage AWS S3 Object Lambdas in order to anonymize data.
Static Data Masking for Azure SQL Database and SQL Server - Microsoft's Static Data Masking is a data protection feature that helps users sanitize sensitive data in a copy of their SQL databases. It is compatible with SQL Server (SQL Server 2012 and newer), Azure SQL Database (DTU and vCore-based hosting options, excluding Hyperscale), and SQL Server on Azure Virtual Machines.
Google Cloud Data Loss Prevention - Google Cloud's fully managed service designed to help you discover, classify, and protect sensitive data.
ARX Data Anonymization Tool - ARX is a comprehensive open source software for anonymizing sensitive personal data.
UTD Anonymization ToolBox - UT Dallas Data Security and Privacy Lab compiled various anonymization methods into a toolbox for public use by researchers.
Kodex - An open-source toolkit for privacy and security engineering. It helps you to automate data security and data protection measures in your data engineering workflows.
Data Anonymizer Extension for PostgreSQL - A set of SQL functions that remove personally identifiable values from a PostgreSQL table and replace them with random-but-plausible values.
Anonimatron - Free, extendable, open source data anonymization tool.
Anonymizer MySQL - This simple tool will allow you to make anonymizerd clone of your database.
MySQL Data Anonymizer - MySQL Data Anonymizer is a PHP library that anonymizes your data in the database.
Anonymizer - Anonymizer is a universal tool to create anonymized DBs for projects.
anonymize-it - The Elastic Machine Learning Team's general purpose tool for suppression, masking, and generalization of fields to aid data pseudonymization.
Singapore Guide to Anonymization - The Singapore Personal Data Protection Commission (PDPC) has published the Guide on Basic Anonymization to provide more practical guidance for businesses on how to appropriately perform basic anonymization and de-identification of various datasets.
Transforming Data in Google Cloud Platform - This reference covers the available de-identification techniques, or transformations, that can be applied in Google Cloud's Data Loss Prevention (i.e., redaction, replacement, masking, crypto-based tokenization, bucketing, date shifting, and time extraction).
Measuring Re-Identification Risk in Datasets / Privacy Definitions - A series of helpful blog posts by Damien Desfontaines on the definitions of k-anonymity, k-map, l-diversity, and delta-presence. Additionally, a video by Utrecht University on the definition of t-closeness.
- k-anonymity
- k-map
- l-diversity
- delta-presence
- t-closeness
Technical Privacy Metrics: a Systematic Survey - Paper by Isabel Wagner and David Eckhoff that discusses over 80 privacy metrics and introduces categorizations based on the aspect of privacy they measure, their required inputs, and the type of data that needs protection. They also present a method on how to choose privacy metrics based on nine questions that help identify the right privacy metrics for a given scenario.
Data Anonymization Tool - The Singapore PDPC has launched a free Data Anonymization tool to help organizations transform simple datasets by applying basic anonymization techniques.
Masked AI - Python SDK and CLI wrappers that enable safer usage of public large language models (LLMs) like OpenAI/GPT4 by removing sensitive data from prompts and replacing it with fake data before submitting to the OpenAI API.

Homomorphic Encryption

Building Safe A.I.: A Tutorial for Encrypted Deep Learning - Blogpost on how to train a neural network that is fully encrypted during training.
HElib - HElib is an open-source software library that implements homomorphic encryption.
Microsoft SEAL - Microsoft SEAL is an easy-to-use open-source (MIT licensed) homomorphic encryption library developed by the Cryptography and Privacy Research group at Microsoft.
nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data - Intel Research proposes an extension to its deep learning compiler to operate on homomorphically encrypted data.
Google Fully-Homomorphic-Encryption - This repository created by Google contains open-source libraries and tools to perform fully homomorphic encryption operations on an encrypted data set.
Palisade Homomorphic Encryption Software Library - An open-source project that provides efficient implementations of lattice cryptography building blocks and homomorphic encryption schemes.
TFHE - The original version of TFHE (Fast Fully Homomorphic Encryption Library over the Torus) that implements the base arithmetic and functionalities (bootstrapped and leveled), allowing you to perform computations over encrypted data.
Concrete - The concrete ecosystem is a set of crates (packages in the Rust language) that implements Zama's variant of TFHE, while most of the complexity of fully homomorphic encryption is hidden under high-level APIs.
FHE.org - Community of researchers and developers interested in advancing Fully Homomorphic Encryption (FHE) and other secure computation techniques.
blyss - Open-source SDK for accessing data privately using homomorphic encryption.
swift-homomorphic-encryption - Apple's open source Swift package that utilizes Private Information Retrieval (PIR).

Tokenization

AWS Serverless Tokenization - Learn how to use Lambda Layers to develop a serverless tokenization solution in AWS.
auto-data-tokenize - This repo demonstrates a reference implementation of detecting and tokenizing sensitive structured data within Google Cloud Platform.

Secure Multi-Party Computation

Private Join and Compute - Google's implementation of the "Private Join and Compute" functionality. This functionality allows two users, each holding an input file, to privately compute the sum of associated values for records that have common identifiers.
Facebook Private Computation Solutions - Facebook Private Computation Solutions (FBPCS) is a secure, privacy safe and scalable architecture to deploy multi-party computation applications in a distributed way on virtual private clouds via Private Scaling architecture. FBPCS consists of various services, interfaces that enable various private measurement solutions, e.g. Private Lift.
Facebook Private Computation Framework - Facebook Private Computation Framework (FBPCF) library allows developers to perform randomized controlled trials, without leaking information about who participated or what action an individual took. It uses secure multiparty computation to guarantee this privacy. FBPCF is for scaling multi-party computation up via threading.
EzPC (Easy Secure Multi-party Computation) - EzPC is a Microsoft Research tool that converts Tensorflow and ONNX models into Secure Multi-Party Computation protocols.

Synthetic Data

Data Synthesizer - DataSynthesizer generates synthetic data that simulates a given dataset.
Faker - Faker is a Python package that generates fake data for you.
Pynonymizer - Pynonymizer is a universal tool for translating sensitive production database dumps into anonymized copies.
Synthetic Data Vault - The Synthetic Data Vault (SDV) enables end users to easily generate Synthetic Data for different data modalities, including single table, multi-table and time series data.
Synthetic Data Generation: Quality, Privacy, Bias (Workshop at ICLR 2021) - Workshop on the intersection of challenges regarding quality, privacy and bias in synthetic data generation.
synthpop - R package for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis.
Synthea - An open-source, synthetic patient generator that models the medical history of synthetic patients.
Presidio Evaluator - Data Generator - This data generator takes a text file with templates (e.g. my name is x]) and creates a list of InputSamples which contain fake PII entities instead of placeholders.
Mimesis - Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages.
plaitpy - plait.py is a program for generating fake data from composable yaml templates.
Bogus - Bogus is a simple fake data generator for .NET languages like C#, F# and VB.NET.
Gretel AI:
- Gretel Synthetics - Synthetic data generators for structured and unstructured text, featuring differentially private learning.
- GDPR Helpers - Generative models to automatically anonymize data to meet GDPR & CCPA standards.
- Anonymize Tabular Data to Meet GDPR Privacy Requirements - A blog post covering how to use Gretel's GDPR Helpers.
Differentially Private Synthetic Data via Foundation Model APIs (DPSDA) - This repo is a Python library to generate differentially private synthetic data without the need of any ML model training.

Differential Privacy and Federated Learning

A Friendly, Non-Technical Introduction to Differential Privacy - Blog post that provides simple explanations for the core concepts behind differential privacy.
A List of Real-World Uses of Differential Privacy - Blog post that compiles a list of real-world deployments of differential privacy, with their privacy parameters.
Differential Privacy at the U.S. Census Bureau - Video on how differential privacy is being implemented in the U.S. Census.
Privacy-Preserving AI - Video on Privacy Preserving AI (Andrew Trask) | MIT Deep Learning Series
PySyft - PySyft is a Python library for secure and private Deep Learning.
CrypTen - CrypTen is a framework for Privacy Preserving Machine Learning built on PyTorch.
Opacus - A library that enables training PyTorch models with differential privacy.
Uber SQL Differential Privacy - This repository contains a query analysis and rewriting framework to enforce differential privacy for general-purpose SQL queries. (deprecated)
Google Differential Privacy Library - This repository contains libraries to generate ε- and (ε, δ)-differentially private statistics over datasets. Includes differential privacy "building block" libraries in C++, Go, and Java, as well as the following:
- Privacy on Beam - A differential privacy framework built on top of Apache Beam.
- Stochastic Tester - Used to help catch regressions that could make the differential privacy property no longer hold.
- Differential Privacy Accounting Library - Used for tracking privacy budget.
- ZetaSQL Differential Privacy Extension - Command line interface for running differentially private SQL queries with ZetaSQL.
- DP-Auditorium - Used for auditing differential privacy guarantees.
PyDP - Python wrapper for Google's Differential Privacy project. The library provides a set of ε-differentially private algorithms, which can be used to produce aggregate statistics over numeric data sets containing private or sensitive information.
IBM's Differential Privacy Library - Diffprivlib is a general-purpose library for experimenting with, investigating and developing applications in, differential privacy.
Microsoft's SmartNoise - This toolkit uses state-of-the-art differential privacy techniques to inject noise into data, to prevent disclosure of sensitive information and manage exposure risk.
NIST Differential Privacy Blog Series - This series is designed to help business process owners and privacy program personnel understand basic concepts about differential privacy and applicable use cases and to help privacy engineers and IT professionals implement the tools.
RAPPOR - Randomized Aggregatable Privacy-Preserving Ordinal Response (RAPPOR) is a technology for crowdsourcing statistics from end-user client software, anonymously, with strong privacy guarantees. (deprecated)
FedML - FedML - The federated learning and distributed training library enabling machine learning anywhere at any scale. It's backed by FedML, Inc. Supporting large-scale geo-distributed training, cross-device federated learning on smartphones/IoTs, cross-silo federated learning on data silos, and research simulation. Best Paper Award at NeurIPS 2020.
FedJAX - Google's JAX-based open source library for federated learning simulations that emphasizes ease-of-use in research.
diffpriv: Easy Differential Privacy - R package that is an implementation of major general-purpose mechanisms for privatizing statistics, models, and machine learners, within the framework of differential privacy of Dwork et al. (2006).
sdcMicro: Statistical Disclosure Control Methods for Anonymization of Microdata and Risk Estimation - R package that can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files.
PPRL: Privacy Preserving Record Linkage - R package that is a toolbox for deterministic, probabilistic and privacy-preserving record linkage techniques.
PipelineDP - Write fast, flexible pipelines that use modern techniques to aggregate user data in a privacy-preserving manner.
Compute Private Statistics with PipelineDP - This Google Developer Codelab walks through how to produce private statistics with differentially private aggregations using the PipelineDP Python framework.
Practical Differential Privacy w/ Apache Beam - Blog post showing how to use Privacy on Beam from Google's differential privacy library.
Computing Private Statistics with Privacy on Beam - This Google Developer Codelab walks through the use of Privacy on Beam to perform differentially private analysis in Go.
Tumult Analytics - Tumult Analytics is a Python library for computing aggregate queries on tabular data using differential privacy.
FLUTE - Created by Microsoft Research, Federated Learning Utilities and Tools for Experimentation (FLUTE) is a framework for running large-scale offline federated learning simulations.
Flower - Originated from a research project at the University of Oxford, Flower (flwr) is a framework for building federated learning systems with a goal to make federated learning accessible to everyone.
Federated Compute Platform - This Google repository hosts infrastructure for compiling and running federated programs and computations in the cross-device setting.
TensorFlow
- TensorFlow Privacy - Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy.
- TensorFlow Federated - TensorFlow Federated (TFF) is an open-source framework for machine learning and other computations on decentralized data.
- TensorFlow Encrypted - TF Encrypted is a framework for encrypted machine learning in TensorFlow.
Four-Episode Podcast on Differential Privacy by This Week in Machine Learning and AI
- Differential Privacy Theory & Practice with Aaron Roth
- Differential Privacy at Bluecore with Zahi Karam
- Scalable Differential Privacy for Deep Learning with Nicolas Papernot
- Epsilon Software for Private Machine Learning with Chang Lu
Episode of This Week in Machine Learning and AI Podcast:
- Privacy-Preserving Decentralized Data Science with Andrew Trask
Four-Article Series on Differential Privacy by Singapore's Data Privacy Protection Capability Centre (DPPCC):
- Sharing Data with Differential Privacy: A Primer
- Practitioners’ Guide to Accessing Emerging Differential Privacy Tools
- Evaluating Differential Privacy Tools’ Performance
- Getting Started with Scalable Differential Privacy Tools on the Cloud

Designing for Trust with Users

Data Permissions Catalogue - Catalogue created by the data consultancy IF to help teams make decisions about how, when, and why to collect and use data about people.
Privacy Patterns - UC Berkeley collection of design patterns attempting to standardize language for privacy-preserving technologies, document common solutions to privacy problems, and help designers identify and address privacy concerns.
How to Protect Your Users with the Privacy by Design Framework - Developers can help to defend their users’ personal privacy by adopting the Privacy by Design (PbD) framework.
The UX Guide to Getting Consent - Short guide by the International Association of Privacy Professionals (IAPP) about obtaining consent under the EU's GDPR.
Creepiness-Convenience Tradeoff - As people consider whether to use the new "creepy" technologies, they do a type of cost-benefit analysis weighing the loss of privacy against the benefits they will receive in return.
Building a Privacy Policy Users Actually Want to Read - Creation of a user-friendly privacy notice through privacy journeying and using a layered notice approach.
Contract Design Pattern Library - Library of guidelines, explanations, and examples to inspire and support you in exploring user-friendly approaches to contract simplification and visualization.
Privacy UX Series in Smashing Magazine:
Lean Privacy Review - Carnegie Mellon University researchers developed a fast, easy method to catch privacy issues early in a system’s development process by gathering feedback from users.

Deceptive Design Patterns

Deceptive Design Patterns - Deceptive design patterns (also known as "dark patterns") are tricks used in websites and apps that make you do things that you didn't mean to, like buying or signing up for something.
The Dark Side of UX Design - Practitioner-identified examples of stakeholder values superseding user values.
Dark Patterns Tipline - Gallery of deceptive patterns identified and submitted by individuals.
10 Examples of Manipulative Consent Requests - Blog post that illustrates ten examples of manipulative consent patterns in cookie banners.

Tagging Personally Identifiable Information

Managing Tags in AWS Resource Groups - Tags are words or phrases that act as metadata that you can use to identify and organize your AWS resources. A resource can have up to 50 user-applied tags.
Categorizing Your AWS S3 Storage Using Tags - In addition to data classification, tagging offers benefits such as fine-grained access control of permissions and object lifecycle management.
Quickstart for Tagging Tables in Google Cloud - Tutorial shows how to create a BigQuery dataset, copy data to a new table in your dataset, create a tag template, and attach the tag to your table.
Using Policy Tags in Google Cloud's BigQuery - Use policy tags to define access to your data, for example, when you use BigQuery column-level security.
Adding a Tag-Based PII Policy in Cloudera - How to add a PII tag-based policy. In this example, the author creates a tag-based policy for objects tagged "PII" in Atlas.
BigQuery PII Classifier - Google Cloud BigQuery PII Classifier is a solution to automate the process of discovering and tagging PII data across BigQuery tables and applying column-level access controls to restrict specific PII data types to certain users/groups.

Regulatory and Framework Resources

Global Comprehensive Privacy Law Mapping Chart - The IAPP's Westin Research Center has created this chart mapping several comprehensive data protection laws.
US State Privacy Legislation Tracker - The IAPP Westin Research Center actively tracks the proposed and enacted comprehensive privacy bills from across the United States.
Privacy in M&A Transactions: The Playbook - The playbook is directed to mergers and acquisitions (M&A) and privacy teams to help identify potential privacy-related issues.
European Data Protection Supervisor Website Evidence Collector - The Website Evidence Collector tool automates the collection of evidence of personal data processing, such as cookies, or requests to third parties.
European Data Protection Board Website Auditing Tool - The Website Auditing Tool is used to collect evidence and generate reports regarding trackers that are being used by websites.
webXray - webXray is a tool for legal and compliance professionals to find privacy violations on the web.
GDPR Developer Guide - In order to assist web and application developers in making their work GDPR-compliant, France's Data Protection Agency, the CNIL, has drawn up a guide of best practices.
Data Protection/Privacy Mapping Project - Microsoft's Data Protection/Privacy Mapping Project facilitates consistent global comprehension and implementation of data protection with an open source mapping between ISO/IEC 27701 and global data protection and/or privacy laws and regulations.
European Data Protection Board Guidelines 4/2019 on Article 25, Data Protection by Design and by Default - This document gives general guidance on the obligation of Data Protection by Design and by Default set forth in Article 25 in the GDPR.
A Guide to Privacy by Design - This document by Spain's Data Protection Agency, AEPD, provides guidance on implementation of Privacy by Design into systems and applications.
Guidance on Anonymisation and Pseudonymisation - This document from the Irish Data Protection Commission (DPC) offers guidance on implementation of anonymization and pseudonuymization.
Emerging Privacy Enhancing Technologies: Current Regulatory and Policy Approaches - The Organisation for Economic Co-operation and Development (OECD)'s report reviews recent technological advancements and evaluates the effectiveness of different types of privacy enhancing technologies (PETs), as well as the challenges and opportunities they present.
UN Guide on Privacy-Enhancing Technologies for Official Statistics - This United Nations (UN) document presents methodologies and approaches to mitigating privacy risks when using sensitive or confidential data.
An Introduction to Privacy Engineering and Risk Management in Federal Systems (NIST IR 8062) - National Institute of Standards and Technology (NIST) Internal Report 8062 provides an introduction to the concepts of privacy engineering and risk management for US federal systems.
De-Identifying Government Datasets: Techniques and Governance (NIST SP 800-188) - NIST Special Publication 800-188 describes the use of deidentification while still allowing for the production of meaningful statistical analysis for US federal agencies.
Guidelines for Evaluating Differential Privacy Guarantees (NIST SP 800-226) - NIST Special Publication 800-226 is intended to help US federal agencies and practitioners of all backgrounds better understand how to evaluate promises made (and not made) when deploying differential privacy.

Conferences

USENIX Conference on Privacy Engineering Practice and Respect (PEPR)
- PEPR 2024 Conference | Videos
- PEPR 2023 Conference | Videos
- PEPR 2022 Conference | Videos
- PEPR 2021 Conference | Videos
- PEPR 2020 Conference | Videos
- PEPR 2019 Conference (slides only)
USENIX Enigma Conference
Symposium on Usable Privacy and Security (SOUPS)
International Workshop on Privacy Engineering (IWPE)

Career

What Does a Privacy Engineer Do, Anyway?
What to Expect in a Privacy Interview
Ethyca's Privacy Engineering Job Board

Miscellaneous

The World of Geolocation Data (FPF Infographic)
Data and the Connected Car (FPF Infographic)
Microphones and the Internet of Things (FPF Infographic)
GDPR – A Practical Guide For Developers
W3C Self-Review Questionnaire: Security and Privacy
OWASP Mobile Application Security Verification Standard (MASVS) - PRIVACY
Privacy is an Afterthought in the Software Lifecycle. That Needs to Change
How Uber is Approaching Data Privacy Architecture
Microsoft - Code with Engineering Playbook: Privacy Fundamentals
Private AI - PETs Decision Tree
IAPP Privacy Engineering Section
VISCHER's Website and App Tracking Legal Checklist
VISCHER's Marketing Communications Legal Checklist

Other Awesome Privacy Curations

awesome-data-privacy
awesome-federated-computing
awesome-gdpr
awesome-artificial-intelligence-guidelines
awesome-ml-privacy-attacks
awesome-privacy-on-blockchains
awesome-zero-knowledge-proofs
awesome-privacy-papers
awesome-ml-sp-papers
awesome-synethetic-data
awesome-privacy
awesome-threat-modeling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome Privacy Engineering

Content

Courses

Books

Data Deletion, Data Mapping, and Data Subject Access Requests

Privacy Tech Series by Lea Kissner

Privacy Threat Modeling

Machine Learning and Algorithmic Bias

Facial Recognition

De-Identification and Anonymization

Homomorphic Encryption

Tokenization

Secure Multi-Party Computation

Synthetic Data

Differential Privacy and Federated Learning

Designing for Trust with Users

Deceptive Design Patterns

Tagging Personally Identifiable Information

Regulatory and Framework Resources

Conferences

Career

Miscellaneous

Other Awesome Privacy Curations

Related GitHub Topics

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome Privacy Engineering

Content

Courses

Books

Data Deletion, Data Mapping, and Data Subject Access Requests

Privacy Tech Series by Lea Kissner

Privacy Threat Modeling

Machine Learning and Algorithmic Bias

Facial Recognition

De-Identification and Anonymization

Homomorphic Encryption

Tokenization

Secure Multi-Party Computation

Synthetic Data

Differential Privacy and Federated Learning

Designing for Trust with Users

Deceptive Design Patterns

Tagging Personally Identifiable Information

Regulatory and Framework Resources

Conferences

Career

Miscellaneous

Other Awesome Privacy Curations

Related GitHub Topics