Skip to content

Latest commit

 

History

History
363 lines (278 loc) · 15.1 KB

auto_rules_extraction.md

File metadata and controls

363 lines (278 loc) · 15.1 KB

Firebase Auto Wipeout: Auto Rules Extraction

Dan Zhang, July 2017

Introduction

The Auto-Wipeout project is part of a Firebase project to help Firebase developer protect privacy of their end-users. The wipeout project is a cloud function library which can automatically delete user data on end-user account deletion. In this way, our customers (Firebase developers) can focus on making their apps special with a lower cost of writing safer apps.

In order to make using the wipeout library closer to a “one-click” experience, wipeout rules could be automatically generated by information inferred from the authorization rules. This doc explains technical details about the inference process from Real Time Database (RTDB) rules. For more general design and concepts, go to the main design doc.

RTDB rules

Quotes from page Understand Firebase Realtime Database Rules:

Firebase Realtime Database Rules determine who has read and write access to your database, how your data is structured, and what indexes exist. These rules live on the Firebase servers and are enforced automatically at all times. Every read and write request will only be completed if your rules allow it.

More detailed references can be found here: Firebase Database Security Rules API.

Traversal of the rule tree

The authorization rule of the RTDB is parsed to be a json object which naturally has a tree structure. We traverse the tree in a Breadth-First search for “.write” rules. Whenever we find a write rule, we first check the status implied by the write rule (discussed in "Single write rule analysis" below), and combine it with the status of its ancestor (discussed "Hierarchical write rules" below) to decide the status of the current location. If it turns out that the current place holds personal data, the patterns for the location (in the format described in the main design doc) will be recorded in wipeout configuration.

Single write rule analysis

First, we discuss how to deal with a single

Definitions:

Variable

The $location variable in firebase security rules. A variable that can be used to reference the key of a $location that was used earlier in a rule structure. For example, replacing the variable $uid in /users/$uid/ with the user name ‘Barney’ will produce path /users/Barney/.

Path Pattern

A sequence of constant and variables defining the path from root to the current location of the write rule. For example, /key/$k1/$k2 is a path pattern. It could match multiple places in the database with different values for the variables.

Auth placeholder

A special variable which could represent Firebase authentication uid of any single user. We use #WIPEOUT_UID as the auth placeholder to avoid potential conflicts with variables of the same name since # is not a valid path component in Firebase RTDB rules.

Access Pattern

Path patterns with variables replaced by auth placeholders according to the write rule. For a particular user with #WIPEOUT_UID, an access pattern describes one or a set of paths to places they have write access to. For example, /users/#WIPEOUT_UID is an access pattern describing a single path, and follow/$followee/#WIPEOUT_UID describes a set of path.

Free variable

Variables in access patterns which are not auth placeholders. They could take any value which matches the access pattern to a valid path in the database.

Principle and assumption

Principle

Only data at paths owned by the user could be deleted at wipeout.

Assumption

If a user is the only one with access to a particular path then that implies the user owns the path. (With the exception of creating an entry: when data.value == null, anyone can write to a place, but only the specific user can write to it afterwards. This case the user owns the data, but the write rule will be considered accessible by multiple users. This is not currently dealt with).

Given this assumption, if there’s only one valid access pattern for a path pattern (means any instance of the pattern can only be written by one user), then user with #WIPEOUT_UID owns the instance(s) of the access pattern.

Now the problem becomes how do we tell the numbers of access patterns? Recall access patterns are describe possible ways that variables in a path could be replaced by auth placeholder. And the auth placeholders will be later replaced by Firebase auth.uid. So we go into the write rule and look for variables that must match auth.uid. E.g. with rule auth.uid == $uid at path pattern /users/$uid, $uid is the variable we are looking for.

Since write rules could be a logical combination of many sub-rules, we normalize the equation to auth.uid == EXP. The underlying meaning is that: the write rule will not evaluates to false if and only if EXP evaluates to true (only considering auth). Here EXP can be an expression consists of conjunctions and disjunctions of variables or a fixed true or false value.

In order to make future reasoning easier, we convert EXP to Disjunctive Normal Form (DNF). In other word, EXP must be disjunction (OR) of one or more conjunctions (AND) of one or more literals (variable in our case).

Now it is clear that with EXP in DNF, each conjunctive clause maps to one access pattern. In conclusion, in order to find out path patterns with single access pattern, we only need to find out the EXP with single conjunctive clause in DNF.

An example

Here is an example based on the previous definition and reasoning.

Assume we have path pattern /key/$k1/$k2, the table below shows the corresponding outcomes with different write rules.

Write Rule Access Pattern(s) EXP in DNF Results
Auth.uid == $k1 /key/#WUID/$k2 ($k2) Single access
Auth.uid == $k2 /key/$k1/#WUID ($k1) Single access
Auth.uid == $k1 && Auth.uid == $k1 /key/#WUID/#WUID ($k1 && $k2) Single access
Auth.uid == $k1 || Auth.uid == $k1 /key/#WUID/$k2; /key/$k1/#WUID ($k1) | ($k2) Multiple access
Auth.uid != null - (True) Multiple access
Auth.uid == null - (False) No access
Auth.uid == SOME_FIX_ID - (False) No access

(In this table, use #WUID instead of #WIPEOUT_UID to save space.)

As shown in the table, a write rule could lead to three access results for a path pattern

  1. [No access] User with uid has no write access to the any instance of the path pattern. EXP has a fixed False value in this case. (Note we are looking for patterns for general users , so auth.uid == SOME_SPECIAL_UID also implies no access )
  2. [Single access] Any instance of the path pattern could be written with one specific auth uid. EXP should have only one conjunctive clause in DNF in this case.
  3. Multiple access] Instances of the path pattern could be written with more than one auth uids. EXP has more than one conjunctive clauses in DNF or has fixed True value in this case.

Implementation details

Representation

The operations of the DNF expressions are implemented in expression.js. An expression object has two fields, booleanValue and conjunctionLists. booleanValue can be true or false or undefined. conjunctionLists represents EXP, and is a list of lists, each inner list represents the list of literals in a conjunction clause, and the outer list represents the disjunction of all the conjunctive clauses.

When booleanValue is True or False, the value of EXP is fixed and the conjunctionList doesn’t matter, and it will be set to an empty list. When booleanValue is undefined, we go ahead and look at the conjunctionLists.

Building EXP

We build EXP for write rules in a bottom-up manners.

  • Any literal with the value true or false will produce EXP = True or EXP = False.
  • Equations (Binary expression) with auth.uid on one side will produce a EXP = $other_side_variable.

Operations

We always keep EXP in DNF, we could do “AND” and “OR” operations of EXPs.

  • AND of two DNF expressions: cross product of the two expressions, where product of two clauses means the union of the their literals.
  • OR of two DNF expressions: concatenate clauses from two expressions

Simplification

The result of operations may need to be simplified according to classical logic:

  • When there are true or false in the expression:
    • A & 0 = 0, A | 1 = 1
    • A & 1 = A, A | 0 = A
  • For literals and conjunction clauses, sort and remove duplicates because:
    • Idempotence: A & A = A, A | A = A
    • Commutativity: A & B = B & A, A | B = B | A
  • Remove redundant clauses according using absorption
    • Absorption: A | (A & B) = A

Hierarchical write rules

As stated in the previous section, there are three possible access status associated with a write rule, they are: NO_ACCESS, SINGLE_ACCESS, MULT_ACCESS. However, since write rules cascade, getting the access status from a single rule is not enough.

Note: Shallower security rules override rules at deeper paths. Child rules can only grant additional privileges to what parent nodes have already declared. They cannot revoke a read or write privilege.

Child rule can grant additional access, thus the access status of the child node not only depends on the child rule, but also the status of its parent.

For example, parent /keys/$k1 with rule auth.uid == $k1 has SINGLE_ACCESS, and child /keys/$k1/$k2 with rule auth.uid == $k2. The child rule itself is single access, but for the node, it is granting additional access besides its parent. So for the child node, the resulting access patterns are [/keys/#WUID, /keys/$k1/#WUID] and the access status is MULT_ACCESS then.

The table below shows ACCESS of child node according to parent node access and child rule access

PARENT-NO PARENT-SINGLE PARENT-MULT
CHILD-NO NO PARENT-SINGLE MULT
CHILD-SINGLE CHILD-SINGLE PARENT-SINGLE/MULT MULT
CHILD-MULT MULT MULT MULT

Note with Parent and child node both having SINGLE_ACCESS, there are two possible outcomes. We need to look into the child rule and see if it grants additional access to its parent. This is done by looking at the conjunction list of parent and child rule. Recall with SINGLE_ACCESS, there should be only one conjunction, each literal in the conjunction adds a restriction to auth.uid. So we look at every literal in the parents conjunction and see if any restriction has been removed in the child rule, if so the child node has MULT_ACCESS, else the child node will have same SINGLE_ACCESS as its parent.

Following the previous example, the parent node has conjunction [$k1] and the child rule has conjunction [$k2]. The restriction that auth.uid == $k1 has been removed in the child node this means the child rule granted additional access and the child node has MULT_ACCESS. If the child rule has conjunction [$k1,$k2], the child rule is adding additional restriction to the parent node while keeping the old restriction and no additional access granted. Since child nodes can’t revoke privilege, the child node will inherit access from its parent.

Implementation details

A new class Access is introduced to describe the access status of nodes and rule. It has two member:

  • accessStatus: NO/SINGLE/MULT_ACCESS
  • variableList: Empty list if accessStatus is NO/MULT; list of literals in conjunction if access is SINGLE

An access object can be created from expression object of write rules. Static method nodeAccess() calculates child node access from child rule access and parent node access.

Data references

Besides variables in the path, Firebase RTDB security rule also supports references to data through methods like child(), and parent(). Our auto extraction also supports a subset of references as listed below

  • Predefined variables:
    • data: corresponds to the current data in RTDB at the location of the currently rule. [supported]
    • root: corresponds to the data at the root of the RTDB. [supported]
    • newData :corresponds to the data that will result if the write is allowed. [ignored] because the restriction on new data is not related to the privilege to write (access) at a place().
  • Referencing methods
    • child(): Gets a reference for the location at the specified relative path. One single argument required to specify the path.[supported]
    • parent(): Gets a reference for the location parent path. No argument. [supported]
  • Evaluating methods
    • val(): Gets the value from the reference. [supported]
    • exists(): Return true if the reference contains any data. [supported]

Data references are evaluated to a string and treated as a normal literal/identifier when we parse the write rule at a particular location. They may imply additional conditions to the access pattern Here are some examples:

With path = /user/data/$uid

  • newData.val() => undefined
  • data.val() => val(rules,user,data,$uid)
  • data.exists() => exists(rules,user,data,$uid)
  • data.child(‘name’).val() => val(rules,user,data,$uid,name)
  • data.child(‘name’).parent().child(‘age’).val() => val(rules,user,data,$uid,age)
  • data.parent().child(auth.uid).val() => val(rules,user,data,#WIPEOUT_UID)
  • root.child(‘data’).child(data.child(‘friend’)).val() => val(rules,user,data,val(rules,user,data,$uid,friend))

Conditions

Besides restrictions on authentication id, there are also additional conditions on variables in the path or data references. We include these conditions in a ‘condition’ field in wipeout rules.

Condition could be

  • A single data reference with exists() value, like data.exists()
  • Or a binary expression of Literal/Identifier/data references connected by operators like ‘==’ , ‘<’ , ‘>’, ’!=’... For example, data.val() != null.

Conditions expression additional restrictions on write accesses, and they have a ‘AND’ relationship restrictions on authentication information.

In terms of implementation, conditions are associated with expressions and access objects. An Expression object with booleanValue True or Undefined can have valid condition. Only Access objects with with SINGLE_ACCESS have conditions.

When we do AND and OR operations of Expressions, their conditions will be joined by the corresponding operator. When we are calculating child node Access based on parent node Access and child rule Access, there’s only one case where conditions will be involved: parent and child rule both have SINGLE_ACCESS and child node have SINGLE_ACCESS. In this case the resulting condition of child node is the OR of the two conditions.