Capture source data that cannot be mapped #652

znatty22 · 2022-12-02T20:01:38Z

Problem

Sometimes source data has columns that cannot be mapped to one of the existing concepts but we want to capture that information anyway. We should have a way to do this

Proposed Solution

Add another data structure to the extract config called additional_info
Each entry in this data structure will be a dict that represents a column that cannot be mapped

additional_info = [
    {
        "column_name": "marital_status",
        "column_type": "boolean",
        "notes": "Whether the participant in the study is married or not",
        "concept": "PARTICIPANT"
    },
   ....
]

During the extract stage of the ingest pipeline, the additional_info structure will be evaluated for each extract config so that a table like the following can be built and written to disk as one of the outputs of the extract stage:

source file	source column	source column type	concept	Notes
clinical.tsv	marital_status	true	PARTICIPANT	Whether the participant in the study is married or not

This is a single table that will contain all of the data in the ingest package that cannot be mapped

The text was updated successfully, but these errors were encountered:

znatty22 added the feature New functionality label Dec 2, 2022

znatty22 changed the title ~~Capture source data that has no obvious concept mapping~~ Capture source data that cannot be mapped Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture source data that cannot be mapped #652

Capture source data that cannot be mapped #652

znatty22 commented Dec 2, 2022 •

edited

Loading

Capture source data that cannot be mapped #652

Capture source data that cannot be mapped #652

Comments

znatty22 commented Dec 2, 2022 • edited Loading

Problem

Proposed Solution

znatty22 commented Dec 2, 2022 •

edited

Loading