-
Notifications
You must be signed in to change notification settings - Fork 0
/
bridge-db-spec.txt
281 lines (233 loc) · 13.4 KB
/
bridge-db-spec.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
BridgeDB specification
Karsten Loesing
Nick Mathewson
0. Preliminaries
This document specifies how BridgeDB processes bridge descriptor files
to learn about new bridges, maintains persistent assignments of bridges
to distributors, and decides which bridges to give out upon user
requests.
Some of the decisions here may be suboptimal: this document is meant to
specify current behavior as of April 2011, not to specify ideal
behavior.
1. Importing bridge network statuses and bridge descriptors
BridgeDB learns about bridges from parsing bridge network statuses and
bridge descriptors as specified in Tor's directory protocol.
BridgeDB parses one bridge network status file first and at least one
bridge descriptor file afterwards.
BridgeDB scans its files on sighup.
BridgeDB does not validate signatures on descriptors or networkstatus
files: the operator needs to make sure that these documents have come
from a Tor instance that did the validation for us.
1.1. Parsing bridge network statuses
Bridge network status documents contain the information which bridges
are known to the bridge authority and which flags the bridge authority
assigns to them.
We expect bridge network statuses to contain at least the following two
lines for every bridge in the given order:
"r" SP nickname SP identity SP digest SP publication SP IP SP ORPort
SP DirPort NL
"s" SP Flags NL
BridgeDB parses the identity from the "r" line and the assigned flags
from the "s" line.
BridgeDB memorizes all bridges that have the Running flag as the set of
running bridges that can be given out to bridge users.
BridgeDB memorizes assigned flags if it wants to ensure that sets of
bridges given out should contain at least a given number of bridges
with these flags.
1.2. Parsing bridge descriptors
BridgeDB learns about a bridge's most recent IP address and OR port
from parsing bridge descriptors.
In theory, both IP address and OR port of a bridge are also contained
in the "r" line of the bridge network status, so there is no mandatory
reason for parsing bridge descriptors. But the functionality described
in this section is still implemented in case we need data from the
bridge descriptor in the future.
Bridge descriptor files may contain one or more bridge descriptors.
We expect bridge descriptor to contain at least the following lines in
the stated order:
"@purpose" SP purpose NL
"router" SP nickname SP IP SP ORPort SP SOCKSPort SP DirPort NL
["opt" SP] "fingerprint" SP fingerprint NL
BridgeDB parses the purpose, IP, ORPort, and fingerprint from these
lines.
BridgeDB skips bridge descriptors if the fingerprint is not contained
in the bridge network status parsed before or if the bridge does not
have the Running flag.
BridgeDB discards bridge descriptors which have a different purpose
than "bridge". BridgeDB can be configured to only accept descriptors
with another purpose or not discard descriptors based on purpose at
all.
BridgeDB memorizes the IP addresses and OR ports of the remaining
bridges.
If there is more than one bridge descriptor with the same fingerprint,
BridgeDB memorizes the IP address and OR port of the most recently
parsed bridge descriptor.
If BridgeDB does not find a bridge descriptor for a bridge contained in
the bridge network status parsed before, it removes that bridge from
the set of bridges to be given out to bridge users.
2. Assigning bridges to distributors
A "distributor" is a mechanism by which bridges are given (or not
given) to clients. The current distributors are "email", "https",
and "unallocated".
BridgeDB assigns bridges to distributors based on an HMAC hash of the
bridge's ID and a secret and makes these assignments persistent.
Persistence is achieved by using a database to map node ID to
distributor.
Each bridge is assigned to exactly one distributor (including
the "unallocated" distributor).
BridgeDB may be configured to support only a non-empty subset of the
distributors specified in this document.
BridgeDB may be configured to use different probabilities for assigning
new bridges to distributors.
BridgeDB does not change existing assignments of bridges to
distributors, even if probabilities for assigning bridges to
distributors change or distributors are disabled entirely.
3. Giving out bridges upon requests
Upon receiving a client request, a BridgeDB distributor provides a
subset of the bridges assigned to it.
BridgeDB only gives out bridges that are contained in the most recently
parsed bridge network status and that have the Running flag set.
BridgeDB may be configured to give out a different number of bridges
(typically 3) depending on the distributor.
BridgeDB may define an arbitrary number of rules saying that a certain
number of bridges should have a given OR port or a given bridge relay
flag.
4. Selecting bridges to be given out based on IP addresses
BridgeDB may be configured to support one or more distributors that
gives out bridges based on the requestor's IP address. Currently, this
is how the HTTPS distributor works.
The goal is to avoid handing out all the bridges to users in a similar
IP space and time.
# Someone else should look at proposals/ideas/old/xxx-bridge-disbursement
# to see if this section is missing relevant pieces from it. -KL
BridgeDB fixes the set of bridges to be returned for a defined time
period.
BridgeDB considers two IP addresses coming from the same /24 as the
same IP address and return the same set of bridges.
BridgeDB divides the IP address space equally into a small number of
areas (typically 4) and return different results to requests coming
from these areas.
# I found that BridgeDB is not strict in returning only bridges for a
# given area. If a ring is empty, it considers the next one. Is this
# expected behavior? -KL
# I also found that BridgeDB does not make the assignment to areas
# persistent in the database. So, if we change the number of rings, it
# will assign bridges to other rings. I assume this is okay? -KL
BridgeDB maintains a list of proxy IP addresses and returns the same
set of bridges to requests coming from these IP addresses.
The bridges returned to proxy IP addresses do not come from the same
set as those for the general IP address space.
BridgeDB can be configured to include bridge fingerprints in replies
along with bridge IP addresses and OR ports.
The current algorithm is as follows. An IP-based distributor splits
the bridges uniformly into a set of "rings" based on an HMAC of their
ID. Some of these rings are "area" rings for parts of IP space; some
are "category" rings for categories of IPs (like proxies). When a
client makes a request from an IP, the distributor first sees whether
the IP is in one of the categories it knows. If so, the distributor
returns an IP from the category rings. If not, the distributor
maps the IP into an "area" (that is, a /24), and then uses an HMAC to
map the area to one of the area rings.
Once the IP-based distributor knows what ring it is handing out bridges
from, it maps the current "epoch" (N-hour period) and the IP's area
(/24) to a point in the ring based on HMAC, and hands out bridges at
that point.
"Mapping X to Y based on an HMAC" above means one of the following:
- We keep all of the elements of Y in some order, with a mapping
from all 160-bit strings to positions in Y.
- We take an HMAC of X using some fixed string as a key to get a
160-bit value. We then map that value to the next position of Y.
When giving out bridges based on a position in a ring, BridgeDB first
looks at flag requirements and port requirements. For example,
BridgeDB may be configured to "Give out at least L bridges with port
443, and at least M bridges with Stable, and at least N bridges
total." To do this, BridgeDB adds to the results:
- The first L bridges in the ring after the position that have the
port 443, and
- The first M bridges in the ring after the position that have the
flag stable and that it has not already decided to give out, and
- The first N-L-M bridges in the ring after the position that it
has not already decided to give out.
5. Selecting bridges to be given out based on email addresses
BridgeDB can be configured to support one or more distributors that are
giving out bridges based on the requestor's email address. Currently,
this is how the email distributor works.
The goal is to bootstrap based on one or more popular email service's
sybil prevention algorithms.
# Someone else should look at proposals/ideas/old/xxx-bridge-disbursement
# to see if this section is missing relevant pieces from it. -KL
BridgeDB rejects email addresses containing other characters than the
ones that RFC2822 allows.
BridgeDB may be configured to reject email addresses containing other
characters it might not process correctly.
BridgeDB rejects email addresses coming from other domains than a
configured set of permitted domains.
BridgeDB normalizes email addresses by removing "." characters and by
removing parts after the first "+" character.
BridgeDB can be configured to discard requests that do not have the
value "pass" in their X-DKIM-Authentication-Result header or does not
have this header. The X-DKIM-Authentication-Result header is set by
the incoming mail stack that needs to check DKIM authentication.
BridgeDB does not return a new set of bridges to the same email address
until a given time period (typically a few hours) has passed.
# Why don't we fix the bridges we give out for a global 3-hour time period
# like we do for IP addresses? This way we could avoid storing email
# addresses. -KL
# The 3-hour value is probably much too short anyway. If we take longer
# time values, then people get new bridges when bridges show up, as
# opposed to then we decide to reset the bridges we give them. (Yes, this
# problem exists for the IP distributor). -NM
# I'm afraid I don't fully understand what you mean here. Can you
# elaborate? -KL
BridgeDB can be configured to include bridge fingerprints in replies
along with bridge IP addresses and OR ports.
BridgeDB periodically discards old email-address-to-bridge mappings.
BridgeDB rejects too frequent email requests coming from the same
normalized address.
To map previously unseen email addresses to a set of bridges, BridgeDB
proceeds as follows:
- It normalizes the email address as above, by stripping out dots,
removing all of the localpart after the +, and putting it all
in lowercase. (Example: "[email protected]" becomes
- It maps an HMAC of the normalized address to a position on its ring
of bridges.
- It hands out bridges starting at that position, based on the
port/flag requirements, as specified at the end of section 4.
6. Selecting unallocated bridges to be stored in file buckets
# Kaner should have a look at this section. -NM
BridgeDB can be configured to reserve a subset of bridges and not give
them out via one of the distributors.
BridgeDB assigns reserved bridges to one or more file buckets of fixed
sizes and write these file buckets to disk for manual distribution.
BridgeDB ensures that a file bucket always contains the requested
number of running bridges.
If the requested number of bridges in a file bucket is reduced or the
file bucket is not required anymore, the unassigned bridges are
returned to the reserved set of bridges.
If a bridge stops running, BridgeDB replaces it with another bridge
from the reserved set of bridges.
# I'm not sure if there's a design bug in file buckets. What happens if
# we add a bridge X to file bucket A, and X goes offline? We would add
# another bridge Y to file bucket A. OK, but what if A comes back? We
# cannot put it back in file bucket A, because it's full. Are we going to
# add it to a different file bucket? Doesn't that mean that most bridges
# will be contained in most file buckets over time? -KL
7. Writing bridge assignments for statistics
BridgeDB can be configured to write bridge assignments to disk for
statistical analysis.
The start of a bridge assignment is marked by the following line:
"bridge-pool-assignment" SP YYYY-MM-DD HH:MM:SS NL
YYYY-MM-DD HH:MM:SS is the time, in UTC, when BridgeDB has completed
loading new bridges and assigning them to distributors.
For every running bridge there is a line with the following format:
fingerprint SP distributor (SP key "=" value)* NL
The distributor is one out of "email", "https", or "unallocated".
Both "email" and "https" distributors support adding keys for "port"
and "flag" and the port number and flag name as values to indicate that
a bridge matches certain port or flag criteria of requests.
The "https" distributor also allows the key "ring" with a number as
value to indicate to which IP address area the bridge is returned.
The "unallocated" distributor allows the key "bucket" with the file
bucket name as value to indicate which file bucket a bridge is assigned
to.