-
Notifications
You must be signed in to change notification settings - Fork 1
/
antelope.txt
393 lines (259 loc) · 21.7 KB
/
antelope.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
==========
Wed Apr 04 14:00:53 -0700 2018
Context for this jam session:
(1) ARMA project to modernize excel-based study; implicating a redo of the calrecycle app
(2) Allbirds potential project: "formalizing our approach to 'product footprints'... imagining a simple excel model with ~30 assumptions we can flex." ==> implicating no, you want a redone calrecycle app
(3) Eco Data Science meetup on R-Shiny, with free opensource, paid hosting, and high-dollar enterprise solutions for shiny apps.
Now then: Moving parts in antelope:
* we have the frontend, which is 100% javascript and can be provided from a dumb http server
- we have config, which is currently bundled in the js but could easily be user-specific
- the config could, furthermore, also easily contain API keys
= the actual frontend app needs to be more or less rebuilt from scratch
* we have the foreground, which is the BasicArchive that stores fragments, flows, and quantities that make up the product model
** the scenarios are still homeless **
* we have the catalog, which is necessary to instantiate the foreground
- and the resources, which are necessary to compute the foreground
The big question right now is where to store the scenarios? scenarios are per-privileged-user (though the argument is that every user should be able to test hypotheses, given a model)
To me, that really means that every instance- every session-- is a live antelope server, spun-up from scratch, and volatile. The model author can write her own scenarios, and they would get stored natively within the foreground. Anyone who clones the foreground would get all those scenarios, and also could add their own. But for that person to save their scenarios, would require that person to have their own storage, distinct from that of the model author.
The workflow would look like this:
* User logs in to service
= user's account includes a toybox of models, each of which includes:
- a semantic foreground reference (username.project.module[.sub].foreground)
- an API key which is actually a signed JWT that proves authorization + specifies access level
= user selects a model to play with
: server creates an AntelopeV1 container
: server clones the specified foreground, which is equivalent to a github repo, into the container
OKAY STOP. The current Av1 server requires a full catalog, and not just a basic archive, and that is because the catalog needs to have resource files (each with their own access tokens) in order to compute the foreground. Fine. It's still a github repo, but it's a larger one.
what if upstream foregrounds get updated? ans: the updates get a new semantic reference. Propagating the changes to the fork becomes equivalent to a git merge-- not trivial, but at least a familiar problem.
this is easier anyway because then each model is atomic / unitary.
= foreground publication itself has to implement access control
: each foreground creates a public/private keypair; sends the public key to the auth server
: owner is specified; owner key is generated by encoding owner's id with the private key
: owner key is not changed with a fork-- thus preventing the forker from generating new keys
: only owner is permitted to generate new keys.
: owner requests key by providing recipient's user id. key is generated by encoding the recipient's id with the private key, thus tying it to the recipient (and preventing the auth server from hijacking the key)
: new key get stored in the foreground and sent to the auth server, which generates a JWT, and regsiters it with the billing server, NOT KNOWING THE RECIPIENT
: key specifies access as 'agg', 'fg', or 'full'
-- 'agg' indicates stage aggregation + lcia-level access (private)
-- 'fg' indicates non-aggregated foreground + lcia-level background
-- 'full' indicates non-aggregated foregorund + exchange-level background
--- probably need a theory for this
: Av1 server then sends the token to the recipient
= when an authorized user connects, he performs a GET request with his user ID as a q param and the JWT as the AUTH payload. antelope v1 server:
: validates the token
: notes the access level
: confirms that the encrypted user ID matches the key
: confirms that the key is authorized
= an authorized user (agg, fg, full) may clone the foreground for the purposes of parameterization.
: stage-level aggregation is not really protective of foreground data if exchange parameterization is allowed
: agg level must prohibit exchange parameterizaton (only permits terminal parameterization)
: the clone must have the auth information scrubbed of all keys not belonging to the requestor
: clone is REQUIRED to parameterize? seems a bit heavyweight. it's not though if git clone --depth=1. the AUOMA repo, which INCLUDES the calrecycle model, is only 2.7 MB and that includes 1MB of reports + eps. uolca is about 1mb all told. and that's a big model.
: so I think clone is good- especially since we don't need to save the image unless the owner has an account.
= The last complicated bit is the private key. We need a way of storing that that is persistent, but still protects it from the auth server / service provider, which owns all the repos.
: I don't know how to solve that. for that we need some consultation.
: https://stackoverflow.com/questions/11575398
: https://softwareengineering.stackexchange.com/questions/205606
: https://medium.freecodecamp.org/how-to-securely-store-api-keys-4ff3ea19ebda
: TL;DR: BlackBox, Docker Secrets
OK, I think we have this worked out.
So in order to bring this about:
- we need the Av1 server
- we need the foreground publication
- we need an auth server with encryption capabilities
In order to replicate the CalRecycle model we also need:
- Av2 servers with privacy protection
==========
Thu Mar 29 13:58:23 -0700 2018
The conceptual center of the antelope framework is a small set of distinct, covering INTERFACES. (see also [interfaces.md])
Let's talk through the USED OIL example, in its truest implementation.
We have:
- USLCI as open background database
* how is it configured? it has to be configured by the model author; but maybe the data user would like to see the effects of different configurations (to wit: different allocation approaches). Currently that sort of thing is NOT SUPPORTED because the USLCI datasets are pre-rolled from GaBi
* HOWEVER, the way one could imagine doing it is by (1) creating an alternative USLCI (new semantic origin) that has the modified allocation treatment and (2) applying alternative background terminations using SCENARIOS. This is fully possible, and in fact suggests that the data provider can easily provide alternative system models, just as ecoinvent does, but notes that the utility of those alternative system models is much greater if the semantic origin (and not the external reference) is the only thing that changes.
- Ecoinvent and PE data as private contributed data
* the prior condition was that exchange data were not exposed in any query
* but the private data still needed to be hosted within the antelope instance, which ultimately meant that the antelope data store became private.
* instead, the data should be hosted elsewhere and the antelope server should have a resource file that provides access to it. The resource file should (as it currently does) designate the privacy, and the antelope server, rather than *enforcing* privacy, should simply *report it*. The outside host should be responsible for enforcing the privacy, saying in effect "access with that resource token only provides aggregated results."
* where is this instantiated? The remote av2 server which hosts, say, the thinkstep SP24 datasets, will process a request with an attached, signed, JWT that authorizes the requestor to access the background interface with a designated privacy flag. the request has a privacy level associated with it, and that level is determined during the authorization step.
An issue to think about: UUID as distinct from semantic reference
e.g. in ecoinvent, it is perfectly desirable for every column of every system model of every minor version to have a truly unique UUID, and it is UNDESIRABLE to manufacture UUID collisions by using the same uuid and different origins for alternatively-configured versions of the same activity.
This is exactly the reason for having the external reference be the main identifying key (and really, either the external ref or the uuid should be accepted)
what this means is that the data provider needs to maintain an authoritative list of external references, and then a mapping of those references to UUIDs within the system models, so that an entered reference like ecoinvent.3.2.apos/market_for_flow_[geography] and ecoinvent.3.2.cutoff/market_for_flow_[geography] both resolve to the correct [same] process
this means that the EntityStore needs to support this mapping, needs to obtain it from somewhere, needs to enable data providers to manage it. that way a user will be able to easily parameterize a flow termination by simply changing the origin without changing the reference (assuming the reference still exists in the new origin)
==========
Thu Mar 22 13:25:52 -0700 2018
In our new architecture, we have a number of different pieces to develop:
* A web infrastructure for providing information on (a) individual studies and (b) databases (open source, public, backend, NAL supported) [Antelope v1/v2 server]
* A research infrastructure for creating and reviewing product system models (open source, public, frontend + backend, UCSB supported) [js model editor; interacts w Av2 server]
* A data infrastructure for intelligent access, retrieval, and review of data (closed source, commercial, frontend + backend, vault.lc) [js data reviewer; interacts w Av2 servers and proprietary backend [e.g. graph / tag db]]
* A free model interactor (open source, public, frontend, interacts with Av1) [CalRecycle FrontEnd]
* a premium model interactor + publisher
Thu 2018-03-22 13:59:30 -0700
The security model for the Av1 server is unclear (as ever).
How is the Av1 server used?
- Study authors use it to publish studies to specific audiences, or to the general public
= study content should not be available to someone who is not authorized by the study creator
- authorized users may create their own scenarios, which *could or could not* need to be stored on the server
= scenario specifications and results should not be available to people not authorized by the scenario creator
In the basic (non-privacy-preserving) imagining, the av1 server is housed in a container which is enclosed within a private cloud. Users access a wrapper service which handles authentication and forwards authorized requests (internally) to the servers that can answer them.
* here the wrapper service is aware of the contents of all requests and presumably has access to all resources in the internal cloud
* this is not desirable if people want to use the service to share secret models with collaborators
Thereby, the privacy-preserving case requires that the av1 server be able to negotiate authorization with users directly. This means that the study author needs to be able to specify access rights in a way that enables a 3rd party (vault.lc) to adjudicate access without being able to obtain access.
Thu 2018-03-22 15:23:43 -0700
OK, so the way I think this works is:
A: Study author; owns secred data
C: Client; granted authorization to view data or results
V: vault.lc; provides hosting and auth adjudication
R: repository to contain secret data; owned by hard private secure storage
+ A must have exclusive access to write to R
+ A must be able to mediate access to read from R (how?)
S: Antelope V1 server instance; created by V
0. A creates secret repo R, including an auth specification with:
- R contains distinct token per authorized client C
- R contains mapping of token to authorized views
- stored non-volatile in R and can be updated, replaced, or revoked by A
1. A registers address of repo @R with vault V [case 0: provides $$$]
2. V instantiates Av1 server S, gives it @R and id of A
- S generates pubkey pair, provides pubkey K_r,pub to V
3. S attempts to access R; A mediates access (how?)
- A grants persistent access (how?)
- R must be able to push out updates to S
4. A gives C her dedicated token
--- A no longer needs to participate, until a new Av1 is required
5. C contacts V with @R [case 1: provides $]
- V constructs JWT with prudent expiry and signs it with K_r,pub
6. V provides @S and JWT to C
7. C transmits request and JWT to S; supplies token as query param (alt.: uses token to establish session)
- S validates JWT: ensures
- S validates query token
- S answers request
Where do scenarios live? if S is stateful, there needs to be a way to make them persistent across instantiations
if they live on C, then S still has to be stateful, unless the full param list is specified with every query (seems wasteful)
I guess the easiest thing is for C to replicate the model and perform its own traversals to implement scenarios
Anyways, it's clear that the flask app needs to be able to receive and validate JWTs.
==========
Wed Dec 06 10:03:19 -0800 2017
I have come up with at least five different things that "Antelope" means:
* the existing antelope v1 CLIENT interface for a finite set of integer-enumerated entities included in a collection of fragments
* the SERVER for that, which may or may not be the same as the fragment builder
* the antelope v2 server, which is meant to be a data clearinghouse that sits on top of an LcCatalog and can translate semantic.ref/entity/query into serialized results
* the antelope v2 CLIENT, which allows me to talk to a remotely stashed ecoinvent so that I don't need it on my local machine
* the antelope node server, which acts like an antelope v2 server for a single semantic endpoint and MAY or MAY NOT include Qdb capabilities.
On top of that, there are two persistent issues that are causing me anxiety moving forward:
- is_elementary and the compartment manager in general was always a stopgap (written in the West Branch library one morning in 2016?) and could be either (a) better aligned with synlist or (b) part of a newly reimagined graph-based Qdb
- flows properly being flowables, compartments being shifted to EXCHANGE TERMINATIONS, which would be a radical reimagining of pretty much everything.
I'm really excited about reducing flows to flowables, but it would break compatibility with just about everything, starting with the existing antelope v1 (not to mention the J Cleaner paper) which steadfastly applied a category to every flow / declared all flows have compartments. [J Cleaner fairly situated compartments as distinct semantic entities]
So-- if compartments are distinct semantic entities-- do we need to store them in archives? If compartments are terminations, does that mean they are really processes??
NO, they are not processes, or if they are they are the mothers of all multifunctional processes.
It wouldn't be meaningful to call a compartment a process, because in order to do LCIA on it the process would have to be allocated (by CF) across all flows into that compartment that have impacts, and suddenly we are right back to having an exploded set of entities.
How would this even work? everything would have to get reimagined. Background emissions, which are presently flow + direction, would need to be flow + termination (with direction implicit??? )
right? there's a deeper problem- the implicit natural directionality of compartments, of contexts. we've already had to start dealing with that in characterization.set_natural_direction() [which must be supplied a compartment manager] -- well so we've already acknowledged it. that's not to say we've solved it.
I think I need to spend some time thinking about how Qdb is supposed to work in this brave new world
Wed 2017-12-06 11:59:40 -0800
Back to antelope servers. We're going to keep the current system of having flows be about paired flowable + context, but we want the antelope interface to be forward thinking. So let's go through the API and make sure that it makes sense.
Well.. api.md is rather hopelessly out of date. It can be simplified a lot.
Wed 2017-12-06 13:18:06 -0800
Worked on this for too long... FWIW I really need to get going on Swagger. I am putting this down, eating some food, and then doing my important TODO for the day.
==========
Fri Jan 05 09:33:43 -0800 2018
Long month...
Swagger spec for antelope v2 is coming along great--
reading about JWT right now as the most likely form of authentication for the resource to use
Good discussion of shortcomings, mainly that there is a single secret key (in our case per resource) that can be compromised:
https://medium.com/@rahulgolwalkar/pros-and-cons-in-using-jwt-json-web-tokens-196ac6d41fb4
Also helpful comparison of JWT and OAuth, which are totally different:
http://www.seedbox.com/en/blog/2015/06/05/oauth-2-vs-json-web-tokens-comment-securiser-un-api/
lots of discussion etc etc:
https://auth0.com/blog/ten-things-you-should-know-about-tokens-and-cookies/
http://www.bubblecode.net/en/2016/01/22/understanding-oauth2/
https://auth0.com/blog/critical-vulnerabilities-in-json-web-token-libraries/
These people are (a) self-styled crypto badasses and (b) dead-set against all things JOSE
https://paragonie.com/blog/2017/03/jwt-json-web-tokens-is-bad-standard-that-everyone-should-avoid
All that said, JWTs seem like a perfect (at least near-term) solution to our particular security problem, particularly public-key-based JWTs, because they allow the resources to autonomously validate requests without any communication with the server.
The problem: if the tokens are ever supposed to be limited-use, then EITHER the resource has to check with the auth server OR the resource has to become stateful. There is no third option. the TOKEN can't be stateful.
Since we're adamant about resources being stateless, and since we ALREADY operate an auth server, I think we should just go that route.
The JWTs should have three different types:
- unrestricted access with a privacy level (integer?)-- strong expiry-- server may still need to be contacted for revocation checks
- metered access (notifies auth server per query for billing-- server returns 200 OK or 401 if revoked)
- limited access (asks auth server per query for sufficiency-- server returns 200 OK or 401 insufficient)
Both of the latter two do away with the "federated" benefit of the JWT, but they preserve query privacy.
Here's how the system will work:
PLAYERS:
AS Authentication server: vault.lc
RP Resource Provider: ecoinvent.org
RC Resource Container: stateless Antelope V2 server: eiv3.2.apos
UU User: Ralph Fishnet
CA Client application: antelope container
STAGES:
Stage 1. Initialize resource
Stage 2. Authenticate session
Stage 3. Query resource
STEPS:
Stage 1: Initialize Resource
* RP approaches AS with data to be made available as a resource
* AS creates RC with one-time session key
* RC creates asymmetric keypair
* RC transmits public key with session key to AS
public+private key pair for RC
$ AS creates stateless RC and seeds it with the "public" key which remains secret
* AS deploys RC
Stage 2: Authenticate session
(mode a: user is already an RP licensee)
* UU approaches AS and logs in to RP via OAuth2
* OAuth2 grant establishes UU's account status
* AS generates an unrestricted JWT with a 1-day expiry
* AS provides a resource file to grant CA data access to RC using JWT
(mode b: user is a vault.lc user with pay-per-use for protected resources)
$ monthly billing
* UU logs into account and invokes CA
* AS provides resource file supporting index interface to RC
* AS provides resource file supporting data proxy interface to RC
(mode c: user is a vault.lc user with an ecoinvent limited-use)
$ during monthly billing, AS generates a limit JWT with a 1-month expiry and a query limit
* UU logs into account and invokes CA
* AS provides resource file supporting index interface to RC
* AS provides resource file to grant CA data access to RC using limit JWT
* AS provides resource file supporting data proxy interface to RC
Stage 3: Query Resource
(mode a: user is already an ecoinvent licensee)
* UU submits query
* CA performs query to RC, using unlimited-use JWT (DWR! not revocable!)
* RC receives query; decodes JWT; validates permission; [optionally increments auth server]; answers query
* CA receives query
* UU is happy
(mode b: user is a vault.lc user with an ecoinvent pay-per-use)
* UU submits query
$ CA performs query; proxy negotiates payment
* proxy returns a JWT
* CA forwards query with JWT to RC
* RC receives query; decodes JWT; consumes token [notifies auth server]; answers query
* CA receives query
* UU is happy
(mode c: user is a vault.lc user with an ecoinvent limited-use)
* UU submits query
* CA performs query
* RC receives query; decodes JWT; validates permission; increments auth server; answers query
* CA receives query
* UU is happy
- if limit is exhausted, CA removes resource containing limit JWT; go to mode b
RC Constructor:
To create an RC, I need: a one-time key, an auth server...
Thu 2018-01-11 10:01:04 -0800
see notes in spiral notebook.
==========
Thu Jan 11 13:34:46 -0800 2018
Components required for an Antelope V2 Server:
- AntelopeV2Server - main constructor
:param auth_server:
:param auth_pubkey:
:param privacy:
:param qdb_server:
:param archive_init_args:
- AntelopeQuery - replaces CatalogQuery
- new QuantityKey - replaces QuantityRef - drop-in as argument in call to load_lcia_factors()
:param qdb_server:
- flask config + route-to-query mapping
I think that could be more or less it.
Thu 2018-03-22 13:25:41 -0700
This is updated in deadtree format in journal