One of the fundamental concepts behind OCA is data harmonization. To harmonize data is to unify two or more datasets to be compliant to each other so that further processing can be performed on unified data. Harmonization may take place on the dataset level, where as a result all the datasets become unified to the same schema. Another type of harmonization is to achieve a compliant subset among schemas. The process to achieve harmonization is called a transformation. Transformation overlays are the recipe to how actually data (or schema) should be transformed.
OCA conductor
serves dual purpose:
- data validation where provided data record is verified against schema defined within
OCA Bundle
; - data transformation where injected datasets are transformed to the new state using transformation overlays.
OCA conductor
is written in Rust thanks to which most C-based languages can be supported by exposing proper FFI-layer. We provide bindings for the following languages:
EUPL 1.2
We have distilled the most crucial license specifics to make your adoption seamless: see here for details.
[dependencies]
oca_conductor = "0.2.14"
- For
oca-transformer
npm i oca-data-transformer
- For
oca-validator
npm i oca-data-validator
Prerequisites:
Install Rust - https://www.rust-lang.org
Clone repository and go to bindings/ffi/validator
directory
git clone https://github.com/THCLab/oca-conductor.git
cd bindings/ffi/validator
Run
cargo build --release --features validator
cargo run --bin uniffi-bindgen generate src/validator.udl -l python
cp target/release/libvalidator.so src/libuniffi_validator.so
All you need are validator.py
and libuniffi_validator.so
files from src
directory.
Example usage:
from validator import *
oca_str = """
{
"capture_base": {
"attributes": {
"n1": "Text",
"n2": "Numeric"
},
"classification": "",
"digest": "ElNWOR0fQbv_J6EL0pJlvCxEpbu4bg1AurHgr_0A7LKc",
"flagged_attributes": [],
"type": "spec/capture_base/1.0"
},
"overlays": [
{
"attribute_entry_codes": {
"n1": ["op1", "op2"]
},
"capture_base": "ElNWOR0fQbv_J6EL0pJlvCxEpbu4bg1AurHgr_0A7LKc",
"digest": "E4L-BukSBsqZoDDIJvw4_gGjAJs5It4UUfiA200lGup0",
"type": "spec/overlays/entry_code/1.0"
},
{
"attribute_conformance": {
"n1": "M"
},
"capture_base": "ElNWOR0fQbv_J6EL0pJlvCxEpbu4bg1AurHgr_0A7LKc",
"digest": "E4L-BukSBsqZoDDIJvw4_gGjAJs5It4UUfiA200lGup0",
"type": "spec/overlays/conformance/1.0"
}
]
}
"""
validator = Validator(oca_str)
validator.set_constraints(ConstraintsConfig(fail_on_additional_attributes=True))
try:
validator.validate(
"""
[
{
"n1": "op",
"additional": 1
},
{
"n2": 2
}
]
"""
)
except ValidationErrors.List as error_list:
for error in error_list.errors:
print(error)
# Result:
# ValidationError.Other(data_set='0', record='0', attribute_name='n1', message='\'n1\' value ("op") must be one of ["op1", "op2"]')
# ValidationError.Other(data_set='0', record='0', attribute_name='additional', message='unknown_attribute')
# ValidationError.Other(data_set='0', record='1', attribute_name='n1', message='missing_attribute')
An Attribute Mapping Overlay defines attribute mappings between two distinct data models. The example below shows the schema manipulation of the source attribute name and the destination attribute name.
{
"attribute_mapping": {
"firstName": "first_name",
"surname": "last_name"
},
"capture_base": "EzMTX96ZkwRBzkUBI9c8jiXfHa6CfKJyrd6uH6vjIUZc",
"digest": "EBfxGsWV4aQGRfRsVH_tDfQCFaBnNqA7cq8YW1Q3DdwA",
"type": "spec/overlays/mapping/1.0"
}
An Entry Code Mapping Overlay defines entry key mappings between two distinct code tables (aka lookup tables) or datasets. The example below shows the mapping between three letter country codes into two letter country codes.
{
"attribute_entry_codes_mapping": {
"country_code": [
"AF:AFG","AL:ALB","DZ:DZA","AS:ASM","AD:AND","AO:AGO","AI:AIA","AQ:ATA","AG:ATG","AR:ARG","AM:ARM","AW:ABW","AU:AUS","AT:AUT","AZ:AZE","BS:BHS","BH:BHR","BD:BGD","BB:BRB","BY:BLR","BE:BEL","BZ:BLZ","BJ:BEN","BM:BMU","BT:BTN","BO:BOL","BQ:BES","BA:BIH","BW:BWA","BV:BVT","BR:BRA","IO:IOT","BN:BRN","BG:BGR","BF:BFA","BI:BDI","CV:CPV","KH:KHM","CM:CMR","CA:CAN","KY:CYM","CF:CAF","TD:TCD","CL:CHL","CN:CHN","CX:CXR","CC:CCK","CO:COL","KM:COM","CD:COD","CG:COG","CK:COK","CR:CRI","HR:HRV","CU:CUB","CW:CUW","CY:CYP","CZ:CZE","CI:CIV","DK:DNK","DJ:DJI","DM:DMA","DO:DOM","EC:ECU","EG:EGY","SV:SLV","GQ:GNQ","ER:ERI","EE:EST","SZ:SWZ","ET:ETH","FK:FLK","FO:FRO","FJ:FJI","FI:FIN","FR:FRA","GF:GUF","PF:PYF","TF:ATF","GA:GAB","GM:GMB","GE:GEO","DE:DEU","GH:GHA","GI:GIB","GR:GRC","GL:GRL","GD:GRD","GP:GLP","GU:GUM","GT:GTM","GG:GGY","GN:GIN","GW:GNB","GY:GUY","HT:HTI","HM:HMD","VA:VAT","HN:HND","HK:HKG","HU:HUN","IS:ISL","IN:IND","ID:IDN","IR:IRN","IQ:IRQ","IE:IRL","IM:IMN","IL:ISR","IT:ITA","JM:JAM","JP:JPN","JE:JEY","JO:JOR","KZ:KAZ","KE:KEN","KI:KIR","KP:PRK","KR:KOR","KW:KWT","KG:KGZ","LA:LAO","LV:LVA","LB:LBN","LS:LSO","LR:LBR","LY:LBY","LI:LIE","LT:LTU","LU:LUX","MO:MAC","MG:MDG","MW:MWI","MY:MYS","MV:MDV","ML:MLI","MT:MLT","MH:MHL","MQ:MTQ","MR:MRT","MU:MUS","YT:MYT","MX:MEX","FM:FSM","MD:MDA","MC:MCO","MN:MNG","ME:MNE","MS:MSR","MA:MAR","MZ:MOZ","MM:MMR","NA:NAM","NR:NRU","NP:NPL","NL:NLD","NC:NCL","NZ:NZL","NI:NIC","NE:NER","NG:NGA","NU:NIU","NF:NFK","MP:MNP","NO:NOR","OM:OMN","PK:PAK","PW:PLW","PS:PSE","PA:PAN","PG:PNG","PY:PRY","PE:PER","PH:PHL","PN:PCN","PL:POL","PT:PRT","PR:PRI","QA:QAT","MK:MKD","RO:ROU","RU:RUS","RW:RWA","RE:REU","BL:BLM","SH:SHN","KN:KNA","LC:LCA","MF:MAF","PM:SPM","VC:VCT","WS:WSM","SM:SMR","ST:STP","SA:SAU","SN:SEN","RS:SRB","SC:SYC","SL:SLE","SG:SGP","SX:SXM","SK:SVK","SI:SVN","SB:SLB","SO:SOM","ZA:ZAF","GS:SGS","SS:SSD","ES:ESP","LK:LKA","SD:SDN","SR:SUR","SJ:SJM","SE:SWE","CH:CHE","SY:SYR","TW:TWN","TJ:TJK","TZ:TZA","TH:THA","TL:TLS","TG:TGO","TK:TKL","TO:TON","TT:TTO","TN:TUN","TR:TUR","TM:TKM","TC:TCA","TV:TUV","UG:UGA","UA:UKR","AE:ARE","GB:GBR","UM:UMI","US:USA","UY:URY","UZ:UZB","VU:VUT","VE:VEN","VN:VNM","VG:VGB","VI:VIR","WF:WLF","EH:ESH","YE:YEM","ZM:ZMB","ZW:ZWE","AX:ALA"
]
},
"capture_base": "EzMTX96ZkwRBzkUBI9c8jiXfHa6CfKJyrd6uH6vjIUZc",
"digest": "EnBjuRoJs8TmfCQJdF8BdmmHwEUbCRrw45xwIFCS7qdY",
"type": "spec/overlays/entry_code_mapping/1.0"
}
A Unit Mapping Overlay defines target units when converting between different units of measurement. The example below demands measurement recalculation into mmol/L
unit, where the source unit is known from the Unit Overlay of the OCA Bundle.
{
"attribute_units": {
"blood_glucose": "mmol/L"
},
"capture_base": "EzMTX96ZkwRBzkUBI9c8jiXfHa6CfKJyrd6uH6vjIUZc",
"digest": "EwpEtyZgBF0B9KXEufQQdeS2aN1GL4JIMcpVRffH3_pg",
"metric_system": "nonSI",
"type": "spec/overlays/unit/1.0"
}