Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture source data that cannot be mapped #652

Open
znatty22 opened this issue Dec 2, 2022 · 0 comments
Open

Capture source data that cannot be mapped #652

znatty22 opened this issue Dec 2, 2022 · 0 comments
Labels
feature New functionality

Comments

@znatty22
Copy link
Member

znatty22 commented Dec 2, 2022

Problem

Sometimes source data has columns that cannot be mapped to one of the existing concepts but we want to capture that information anyway. We should have a way to do this

Proposed Solution

  • Add another data structure to the extract config called additional_info
  • Each entry in this data structure will be a dict that represents a column that cannot be mapped
additional_info = [
    {
        "column_name": "marital_status",
        "column_type": "boolean",
        "notes": "Whether the participant in the study is married or not",
        "concept": "PARTICIPANT"
    },
   ....
]

During the extract stage of the ingest pipeline, the additional_info structure will be evaluated for each extract config so that a table like the following can be built and written to disk as one of the outputs of the extract stage:

source file source column source column type concept Notes
clinical.tsv marital_status true PARTICIPANT Whether the participant in the study is married or not

This is a single table that will contain all of the data in the ingest package that cannot be mapped

@znatty22 znatty22 added the feature New functionality label Dec 2, 2022
@znatty22 znatty22 changed the title Capture source data that has no obvious concept mapping Capture source data that cannot be mapped Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
Development

No branches or pull requests

1 participant