Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index and order - qdb - lcia #7

Merged
merged 25 commits into from
Feb 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
09ca45f
Generate index and ordering for resources
bkuczenski Jun 3, 2021
99182be
actually deal with resources and config
bkuczenski Jun 6, 2021
3c271cb
Use IndexAndOrder interactively to generate persistent config
bkuczenski Jun 8, 2021
e0346d4
Some minor API work
bkuczenski Jun 9, 2021
882dc9a
Merge branch 'main' into index_and_order
bkuczenski Jun 9, 2021
d5125b1
Finish prior work; static catalog setup
bkuczenski Jun 10, 2021
fd7d9b5
Move async tasks out of repo
bkuczenski Jun 14, 2021
1a8f2e9
Operable index API
bkuczenski Jun 14, 2021
c261c04
Most of exchange interface working
bkuczenski Jun 15, 2021
bc028b3
BG Finished! but problems remain
bkuczenski Jun 16, 2021
cab815a
Start building out the native quantity interface
bkuczenski Jun 25, 2021
3ef7e36
Try out OLCA ref data on xdb
bkuczenski Jun 26, 2021
bc7ebe3
Completed the set of response models (excl. foreground)
bkuczenski Jun 29, 2021
1c0876f
quantity work, mostly
bkuczenski Jul 8, 2021
b004e84
Split off + build out qdb router
bkuczenski Jul 15, 2021
2d71d01
Something went wrong w authorized query
bkuczenski Jul 17, 2021
ab861a8
Towards foreground LCIA over POST
bkuczenski Jul 17, 2021
cc1882a
clean up entity retrieval
bkuczenski Jul 21, 2021
75c6fef
Foreground LCIA
bkuczenski Jul 22, 2021
40e2bc4
Get Foreground LCIA working (POST exchanges route)
bkuczenski Jul 24, 2021
02ced3a
2 out of 3 LCIA routes operational
bkuczenski Jul 24, 2021
bdca102
post flow spec for factors- LCIA route 0
bkuczenski Jul 27, 2021
73b0c28
JWT Auth baby!
bkuczenski Sep 11, 2021
7e83820
publication doco bump
bkuczenski Feb 1, 2022
0534807
Get it running with the new auth
bkuczenski Feb 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,3 @@ RUN make test
RUN make install
WORKDIR /project

FROM python:3 as dagster
RUN apt-get update
RUN apt-get install -y unzip curl
RUN pip install dagster dagster-aws dagster-shell dagit
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
RUN unzip awscliv2.zip
RUN ./aws/install
WORKDIR /project/etl
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,23 @@
# xdb
Exchange database
Antelope Exchange database

This repo contains a deployable antelope server that is exposed via a REST API. All the exchange data are static, so a REST model is appropriate.
This is a REST HTTP server for hosting LCA data according to the [Antelope interface](https://github.com/AntelopeLCA/antelope).
This repo contains a deployable antelope server that is exposed via a REST API. All the exchange data are static, so a REST model is appropriate.

The server is linked to an authentication and authorization mechanism that would evaluate each request in terms of the requester's access level.
Every query must be accompanied by an authorization token that has been computed as indicated in
[xdb_tokens.py](https://github.com/AntelopeLCA/antelope/blob/virtualize/antelope/xdb_tokens.py).



## Run the server

The
From the root directory, run:

$ ANTELOPE_CATALOG_ROOT=/data/LCI/my_container uvicorn api:app --host 0.0.0.0 --reload


The server should be linked to an authentication and authorization mechanism that would evaluate each request in terms of the requestor's access level.

## config

Expand Down
1 change: 1 addition & 0 deletions api/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .api import app
205 changes: 112 additions & 93 deletions api/antelope_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,50 @@ A tuple of (origin, external reference) specifies a distinct entity. In the eve

### server-wide queries

APIv2_ROOT/ - return server metadata, incl list of origins
APIv2_ROOT/origins - return list of known origins

When I think of more, I'll put them here.


### Basic Entity queries

These types of queries do not depend on any particular interface access.

Entity-specific queries:

APIv2_ROOT/[origin]/[entity id] - return a thorough description of the entity
APIv2_ROOT/[origin]/[entity id]/reference - return a unitary reference*
APIv2_ROOT/[origin]/[process id]/references - return list of reference exchanges
APIv2_ROOT/[origin]/[flow id]/unit - unit string of the flow's reference quantity
APIv2_ROOT/[origin]/[flow id]/context - the flow's full context as a list (or empty list)

* A quantity's reference is a unit (string); a flow's reference is a quantity record. Processes are constituted to have zero or more reference exchanges, though most have only one. If a process has a single reference exchange (or if a unitary reference is somehow designated), it will be returned; otherwise a 404 is returned with the message "No Reference" or "Multiple References".

On the other hand, non-processes with unitary references can always be returned as single-entry lists, so the `references` query will never return an error for a valid entity.

The basic interface can also be used to compute LCIA results, provided (a) the xdb server has access to a background
implementation for the named process; (b) the xdb server recognizes the quantity and can query against it and (c) the
query credentials are authorized by the qdb that is consulted for the query. The query does NOT require background
or even exchange access to run this route, but if background access is not present, then only a summary LCIA resut
(i.e. no details) will be returned.

quantity is known to the local qdb:

APIv2_ROOT/[origin]/[process id]/lcia/[quantity id] - perform LCIA on process LCI
APIv2_ROOT/[origin]/[process id]/[ref flow]/lcia/[quantity id]

quantity is known in a remote qdb (xdb must be able to resolve the resource)

APIv2_ROOT/[origin]/[process id]/lcia/[quantity origin]/[quantity id]
APIv2_ROOT/[origin]/[process id]/[ref flow]/lcia/[quantity origin]/[quantity id]


### Index queries

Origin-specific queries:


APIv2_ROOT/[origin]/<entities> - list entity records; query to search
APIv2_ROOT/[origin]/processes
APIv2_ROOT/[origin]/flows
Expand All @@ -48,33 +84,27 @@ Origin-specific queries:

# would these be better as /processes/count?
APIv2_ROOT/[origin]/count - dict containing count of all entity types
APIv2_ROOT/[origin]/count/<entityes> - int reporting count of specified entity type

These are not implemented at the API, but could be:

APIv2_ROOT/[origin]/count/<entities> - int reporting count of specified entity type
APIv2_ROOT/[origin]/count/processes - /count/process synonym
APIv2_ROOT/[origin]/count/flows - /count/flow synonym
APIv2_ROOT/[origin]/count/quantities
APIv2_ROOT/[origin]/count/contexts
APIv2_ROOT/[origin]/count/flowables

APIv2_ROOT/[origin]/synonyms/[term] - list synonyms for the specified term
APIv2_ROOT/[origin]/synonyms?term=term - "" ""
APIv2_ROOT/[origin]/get_context/[term] - return canonical full context for term, as a list
APIv2_ROOT/[origin]/get_context?term=term - "" ""

Entity-specific queries:

APIv2_ROOT/[origin]/[entity id] - return a thorough description of the entity
APIv2_ROOT/[origin]/[entity id]/reference - return a unitary reference*
APIv2_ROOT/[origin]/[process id]/references - return list of reference exchanges
APIv2_ROOT/[origin]/[flow id]/unit - unit string of the flow's reference quantity
APIv2_ROOT/[origin]/[flow id]/context - the flow's full context as a list (or empty list)
APIv2_ROOT/[origin]/[flow id]/targets - return reference exchanges containing the flow
APIv2_ROOT/[origin]/[context]/parent - context's parent or none
APIv2_ROOT/[origin]/[context]/sense - context's parent or none
APIv2_ROOT/[origin]/[context]/subcontexts - list of subcontexts

* A quantity's reference is a unit (string); a flow's reference is a quantity record. Processes are constituted to have zero or more reference exchanges, though most have only one. If a process has a single reference exchange (or if a unitary reference is somehow designated), it will be returned; otherwise a 404 is returned with the message "No Reference" or "Multiple References".
APIv2_ROOT/[origin]/contexts/[context] - get_context() implementation - includes parent, sense, subcontexts

On the other hand, non-processes with unitary references can always be returned as single-entry lists, so the `references` query will never return an error for a valid entity.
These are not implemented at the API, but could be:

APIv2_ROOT/[origin]/contexts/[context]/parent - context's parent or none
APIv2_ROOT/[origin]/contexts/[context]/sense - context's sense or none
APIv2_ROOT/[origin]/contexts/[context]/subcontexts - list of subcontexts

### Documentary queries

Expand Down Expand Up @@ -144,88 +174,77 @@ Only in cases where processes have a single designated reference exchange, may t

All background aspect queries return lists of exchanges, either reference exchanges (value always 1) or dependent exchanges (normalized to reference exchange). The "aspects" are as follows:

APIv2_ROOT/[origin]/[process id]/[ref flow]/consumers - [reference exchanges]
APIv2_ROOT/[origin]/[process id]/[ref flow]/dependencies - [exchange values]
APIv2_ROOT/[origin]/[process id]/[ref flow]/emissions - [exchange values]
APIv2_ROOT/[origin]/[process id]/[ref flow]/cutoffs - [exchange values]
APIv2_ROOT/[origin]/[process id]/[ref flow]/lci - [exchange values]
APIv2_ROOT/[origin]/[process id]/[ref flow]/sys_lci - [exchange values]
APIv2_ROOT/[origin]/[process id]/[ref flow]/foreground - [exchange values]
APIv2_ROOT/[origin]/[process id]/[ref flow]/ad - [exchange values]
APIv2_ROOT/[origin]/[process id]/[ref flow]/bf - [exchange values]
APIv2_ROOT/[origin]/[process id]/[ref flow]/consumers - [reference exchanges]
APIv2_ROOT/[origin]/[process id]/[ref flow]/dependencies - [IntermediateExchanges]
APIv2_ROOT/[origin]/[process id]/[ref flow]/emissions - [ElementaryExchanges]
APIv2_ROOT/[origin]/[process id]/[ref flow]/cutoffs - [CutoffExchanges]
APIv2_ROOT/[origin]/[process id]/[ref flow]/lci - [ElementaryExchanges] + [CutoffExchanges]
APIv2_ROOT/[origin]/[process id]/[ref flow]/sys_lci - [ElementaryExchanges] + [CutoffExchanges]
APIv2_ROOT/[origin]/[process id]/[ref flow]/foreground - [exchange values]
APIv2_ROOT/[origin]/[process id]/[ref flow]/ad - [IntermediateExchanges]
APIv2_ROOT/[origin]/[process id]/[ref flow]/bf - [ElementaryExchanges] + [CutoffExchanges]
APIv2_ROOT/[origin]/[process id]/[ref flow]/lci/[dep flow] - [ElementaryExchanges]

Only flows terminated to *elementary* contexts are emissions; other flows (both unterminated and terminated to intermediate contexts) are "cutoffs".

### Quantity queries

There are two layers to the quantity engine: the native layer, which includes only strict mappings and does not perform
any reconciliation, and the qdb layer, which does reconcile the native data with a set of canonical terms. The
native-layer queries are answered by the individual archives and their TermManagers, while the qdb-layer queries
are answered by the catalog's LciaEngine. The qdb layer is also (someday) going to be implemented by a stand-alone
qdb which implements a graph database.

Some things worth noting:
* The native-layer quantity queries are a subset of the full quantity interface, i.e. some queries only make sense
in the context of reconciliation across data sources (for instance, the factors POST method).
* The native layer is nominally read-only (comes from a static XDB data source), but the qdb layer can grow
* Identification of elementary flows is not reliable for native queries, only for qdb queries (because it depends
on reconciling contexts)

**Native-layer quantity queries**

APIv2_ROOT/[origin]/synonyms?term=term - list synonyms for the specified term
APIv2_ROOT/[origin]/contexts/[term] - return canonical full context for term

APIv2_ROOT/[origin]/[flow id]/profile - list characterizations for the flow
APIv2_ROOT/[origin]/[flow id]/cf/[quantity id] - return the characterization value as a float (or 0.0)

APIv2_ROOT/[origin]/[quantity id]/norm - return a normalization dict
APIv2_ROOT/[origin]/[quantity id]/factors - list characterizations for the quantity
APIv2_ROOT/[origin]/[quantity id]/convert/[flow id] - return a QuantityConversion
APIv2_ROOT/[origin]/[quantity id]/convert/[flowable]/[ref quantity] - return a QuantityConversion

APIv2_ROOT/[origin]/[quantity id]/lcia {POST} - perform LCIA on POSTDATA = list of exchange refs


## Summary of return types:

* String
* Integer
* Float
* EntityRecord - origin, entity ID, entity type, name
* RichEntityRecord - EntityRecord + search key, search value (*for use in answering a search query*)
* Context - name, parent, sense
* Reference Exchange - origin, process, flow, direction, locale[, comment]
* Exchange - origin, process, flow, termination, locale[, comment]
* ExchangeValue - Exchange + value
* ExteriorFlow - origin, flow, direction, termination
Meta-types
* ServerMeta - info about the xdb
* OriginMeta - available / authorized interfaces
* OriginCount - maybe part of OriginMeta?

Basic/Index types
* Entity - origin, entity ID, entity type, properties
* FlowEntity - Entity + context, locale, referenceQuantity
* Context - name, parent, elementary, sense, subcontexts

Exchange/Background types
* ExteriorFlow - origin, flow, direction (W/R/T interior), context
* Exchange - origin, process, flow, direction, termination, type, comment, str
* ReferenceExchange - reference=True, termination=None
* ReferenceValue - ReferenceExchange + value
* ExchangeValues - Exchange + multiple values, one per reference, + uncertainty
* AllocatedExchange - Exchange + ref_flow + value + uncertainty

Quantity types
* Characterization - origin, flowable, ref quantity, query quantity, context, dict of locale:value
* Normalization - origin, quantity, dict of locale: value
* QuantityConversion - basically a QRResult: origin, flowable, ref qty, q qty, context, locale, value
* LciaDetailedResult
* LciaAggregation
* LciaResult

I think that's all of them.

## A key question for Return Data

For these queries, I have the decision of whether and how to state the entity's origin in the response. The client must know the origin because the origin is part of the request-- thus re-stating it wastes bandwidth? or is it not a concern bc of gzip?

Query: `APIv2_ROOT/my.data.source/processes?name=aluminium`

Option 1: explicit, full:

[
{
"origin": "my.data.source",
"entityId": "4xad",
"entityType": "process",
"name": "Aluminium casting plant"
},
{
"origin": "my.data.source",
"entityId": "4xae",
"entityType": "process",
"name": "Aluminium smelting plant"
},
...
]

Option 1: explicit, nested

{
"origin": "my.data.source",
"processes": [
{
"entityId": "4xad",
"entityType": "process",
"name": "Aluminium casting plant"
},
{
"entityId": "4xae",
"entityType": "process",
"name": "Aluminium smelting plant"
},
....
]
}

Option 3: unspecified (implicit, most compact):

[
{
"entityId": "4xad",
"entityType": "process",
"name": "Aluminium casting plant"
},
{
"entityId": "4xae",
"entityType": "process",
"name": "Aluminium smelting plant"
},
...
]
Loading