Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] updating reproschema commands to the new pydantic model #36

Merged
merged 49 commits into from
Jun 15, 2024

Conversation

djarecka
Copy link
Member

@djarecka djarecka commented Apr 5, 2024

@satra - for now I updated only validate, but can you please take a look if this is what you had in mind?

also, I wasn't sure where the context should be, for now I added to models

@satra
Copy link
Contributor

satra commented Apr 9, 2024

looks like a reasonable approach. a few comments.

  • i don't mind a temporary inclusion of the context, but ideally it should just be picked up from the release in reproschema.
  • perhaps change the pydantic model file to be model_autogen.py and then have a model.py that simply imports and adjusts (since pydantic 2.0 has an injection mode for modification) any of the autogen classes. validate can then import from model as it does.
  • reproschema2redcap and vice versa should also use these models
  • we will need some helper functions probably to reduce some boilerplate of instantiating these classes (like builders).

@djarecka
Copy link
Member Author

djarecka commented Apr 9, 2024

ok, I understand that all the changes that we were doing to the pydantic model, including changes the type langString to Dict[str, str], should be done here in model.py.

@djarecka
Copy link
Member Author

@satra - should we include in reproschema something like _ldmeta (from dandi schema) to save that a class is reproschema:Protocol, etc.?
Something would be needed if we want to keep tests like this

@satra
Copy link
Contributor

satra commented Apr 16, 2024

you already have that in the meaning/mapping/schema_uri components of the linkml schema. the generator does not add that to pydantic, and that may be good to add to the metadata. i thought you or puja was going to look into how to add other metadata to the generated code on the linkml side.

more specifically, those tests are really not helpful at the moment :)

@djarecka
Copy link
Member Author

you already have that in the meaning/mapping/schema_uri components of the linkml schema. the generator does not add that to pydantic, and that may be good to add to the metadata. i thought you or puja was going to look into how to add other metadata to the generated code on the linkml side.

Yes, indeed, there is a current PR that should add linkml_metadata automatically

Copy link
Contributor

@satra satra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we also do the things we discussed earlier in the thread (#36 (comment))?

raise Exception(f"expected a list or dictionary, got {data_el}")


def fixing_old_schema(data, copy_data=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this migrate_schema, include a version check, and set schemaVersion to 1.0.0?

@@ -0,0 +1,252 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file needed in this repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that first we will release reproschema and after this I will point to the specific version

@@ -57,13 +55,11 @@ def validate_dir(directory, shape_file, started=False, http_kwargs={}):
return True


def validate(shapefile, path):
def validate(path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also check schemaVersion and either ask user to migrate or ask them to use the older shacl-based validator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I thought that the idea was to use fixing_old_schema (or migrate_schema) for the older version. This is why I was checking entire reproschema-library with the new validation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this function doesn't check anything, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it initialize the pydantic class, so will get an error if the arguments are off

@djarecka
Copy link
Member Author

could we also do the things we discussed earlier in the thread (#36 (comment))?

yes, I will work from the new redcap2reproschema scripts that @yibeichan updated

@djarecka
Copy link
Member Author

djarecka commented Jun 7, 2024

@yibeichan - can you check if the fields with compute makes sense. I checked the phq9 as Satra suggested and there is indeed no isVis there

@yibeichan
Copy link
Contributor

@djarecka yes, isVis should be False for compute items.

but the newly generated compute items are not correct. there are many repetitive items, you can see a lot of pex_bm_apa1_flag01 here. This is from pex_bm_apa_schema (I can't attach the file here but sent to your slack. this is generated from the HBCD csv I shared with you on slack last time)

{
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"

@djarecka
Copy link
Member Author

djarecka commented Jun 7, 2024

@yibeichan - ok, will check the repetition.

btw., I believe I also found some error with inputType in the code, and I started checking more, and I realized that in one of the csv you gave me there are many Field Types. DO you know what we should do with checkbox or file?

@yibeichan
Copy link
Contributor

@djarecka for checkbox we can use multipleChoice
for file, I found it in bridge2ai csv, where it requires pdf. we don't have equivalent fields in reproschema (we have audioOject, videoObject, image, contentUrl), maybe we can use something in schema.org, I can think about [DigitalDocument](https://schema.org/DigitalDocument) or [Thing](https://schema.org/Thing). what do you think? @satra

@djarecka
Copy link
Member Author

djarecka commented Jun 7, 2024

@yibeichan - just to be clear, we are talking about inputType, and this is the list I'm using as a reference what is supported by the ui.
I guess checkbox can be select, but don't see file.

Also not sure what are other options that you might have in other redcap cvs. I will add perhaps some exceptions, because now there are ignored

@satra
Copy link
Contributor

satra commented Jun 7, 2024

the demo protocol has a file upload item. perhaps check that out.

@djarecka
Copy link
Member Author

djarecka commented Jun 7, 2024

@djarecka yes, isVis should be False for compute items.
for compute there is no isVis

but the newly generated compute items are not correct. there are many repetitive items, you can see a lot of pex_bm_apa1_flag01 here. This is from pex_bm_apa_schema (I can't attach the file here but sent to your slack. this is generated from the HBCD csv I shared with you on slack last time)

can you check now. I think the loops in the codes was wrong and it went multiple times through rows. Now the codes runs much faster...

@yibeichan
Copy link
Contributor

yessss! it's way faster and all repetitive results have gone!! thank you!!

@yibeichan
Copy link
Contributor

@djarecka what are the remaining issues we need to fix before merging this PR?

@djarecka
Copy link
Member Author

djarecka commented Jun 7, 2024

the demo protocol has a file upload item. perhaps check that out.

yes, I found documentUpload, perhaps reproschema-ui should be updated to give this option (@yibeichan )

what about sql that I see in some rows in hbcd, there are usually some "labels", e.g. "illness label". In addition that I don't know how to represent in reproschema inputType (I believe we don't have sql), I'm guessing that this should be hidden?? I can even see in Field Annotation of the csv "@hidden", but in our code "Field Annotation" is translated to "description", so I'm a bit lost...

I the "Field Annotation" I can see multiple mysterious values, that I'm pretty sue are not human-readable descriptions, but have no ide what to do with them, e.g. "@hidden @NOW-SERVER" or "@noneoftheabove=5"

@yibeichan
Copy link
Contributor

yes, I found documentUpload, perhaps reproschema-ui should be updated to give this option (@yibeichan )

yes, it's in the UI: https://github.com/ReproNim/reproschema-ui/blob/05096f07b3c27e82ee9382849b073c0896448a47/src/components/Inputs/DocumentUpload/DocumentUpload.vue#L2 (is this what you need?)

what about sql that I see in some rows in hbcd, there are usually some "labels", e.g. "illness label". In addition that I don't know how to represent in reproschema inputType (I believe we don't have sql), I'm guessing that this should be hidden??

for sql we should treat it as the same we do for calc; it's a calculation with a different language. so we probably need to translate SQL to JS expressions or we extract the SQL and put it in compute for now but fix it later @satra what do you think?
and we should use isVis: false for sql too

I can even see in Field Annotation of the csv "@hidden", but in our code "Field Annotation" is translated to "description", so I'm a bit lost...

sorry about this, yes, it should not be translated to description directly. I found most rows in Field Annotation have @hidden so we should use Field Annotation for multiple attributes such as isVis, other attributes usage see below

I the "Field Annotation" I can see multiple mysterious values, that I'm pretty sue are not human-readable descriptions, but have no ide what to do with them, e.g. "@hidden @NOW-SERVER" or "@noneoftheabove=5"

yes, for @LANGUAGE-CURRENT-SURVEY we are supposed to specify something language related. It's row 1876, a radio for choosing language. so it's like depending on which language a participant chose, we need to display reproschema in that language (something like selected_language in the UI)

row 1877, it has @NOW it should be date related, same as @NOW-SERVER

row 1879, it has @IF([event-name][visit_start_complete] != 2, @DEFAULT='1', @DEFAULT='2') we should treat it as condition or branch logic

row 1886 @NONEOFTHEABOVE = '4,999,777', should be treated as condition/branch logic too.

row 2455 @CALCTEXT(if([pex_bm_healthv2_preg_i_illness_019_i_03] != "", [pex_bm_healthv2_preg_i_illness_019_i_03], [pex_bm_healthv2_preg_i_illness_019_i_05])) this item should be treated as compute.

that's a lot of variations, we probably need a dictionary about how to map them. I can take a look at all HBCD and bridge2ai CSV files and find unique values of Field Annotation and create a dictionary for you this weekend

@djarecka
Copy link
Member Author

djarecka commented Jun 8, 2024

@yibeichan @satra

I want to make two main points from HBCD:

  • I believe neither requirements that are marked as sql (e.g. SELECT distinct label from redcap_web_service_cache WHERE value in (SELECT ...., not marked just calc (e.g. $('#pex_bm_healthv2_preg_i_illness_016_i_03-tr select option:nth(1)').prop('selected',true)) doesn't look to me as something that could be just added to reproschema:compute
    An example from reproschema that I found has forms like this

  • for the branching logic, I found examples here, so the the form from "Branching Logic" in HBCD from that looks like [sed_bm_demo_roster_003] >= 1 AND [sed_bm_demo_roster_004_i_01] = '0' can't be directly included in isVis, but at least I know that it could be translated. However, for the items that could be found in Field Annotation I really have no idea what to do (well, @Hidden is easy to guess), I could add @NONEOFTHEABOVE = '4,999,777' to the reproschema schema, but I really do not think reproschema-ui will know what to do.

These issues are completely unrelated to my changes. I've been fixing multiple old issues with the code, since I'm getting newer examples, and I can keep doing this (if someone helps me with it), but perhaps this should be done in another PR?
However, this seems like critical issue, so if I add to compute or isVis that I see in cvs files no one should expect the schema to work with ui.

@yibeichan
Copy link
Contributor

  • I believe neither requirements that are marked as sql (e.g. SELECT distinct label from redcap_web_service_cache WHERE value in (SELECT ...., not marked just calc (e.g. $('#pex_bm_healthv2_preg_i_illness_016_i_03-tr select option:nth(1)').prop('selected',true)) doesn't look to me as something that could be just added to reproschema:compute
    An example from reproschema that I found has forms like this

you mean SQL cannot be directly put into jsExpression right? yes, I agree, that's why I asked "for sql we should treat it as the same we do for calc; it's a calculation with a different language. so we probably need to translate SQL to JS expressions or we extract the SQL and put it in compute for now but fix it later?"

  • for the branching logic, I found examples here, so the the form from "Branching Logic" in HBCD from that looks like [sed_bm_demo_roster_003] >= 1 AND [sed_bm_demo_roster_004_i_01] = '0' can't be directly included in isVis, but at least I know that it could be translated. However, for the items that could be found in Field Annotation I really have no idea what to do (well, @Hidden is easy to guess), I could add @NONEOFTHEABOVE = '4,999,777' to the reproschema schema, but I really do not think reproschema-ui will know what to do.

so among all HBCD csv files I have, Field Annotation has the following unique values: {'@NOW', '@NONEOFTHEABOVE', '@READONLY', '@IF', '@NOW-SERVER', '@LANGUAGE-CURRENT-SURVEY', '@CALCDATE', '@CALCTEXT', '@HIDDEN'}

@NOW, @NOW-SERVER should be something related to date, UI has something called [Timer](https://github.com/ReproNim/reproschema-ui/blob/05096f07b3c27e82ee9382849b073c0896448a47/src/components/Timer/Timer.vue#L51), could be related

@READONLY can be treated as description

@IF can be treated as branch logic

@CALCDATE has examples as @CALCDATE([screening_arm_1][setup_lmp], 0, "d") it should do some computing but yes, it's a complicated issue

@CALCTEXT has examples as @CALCTEXT(if([pex_bm_healthv2_preg_i_illness_015_i_03] != "", [pex_bm_healthv2_preg_i_illness_015_i_03], [pex_bm_healthv2_preg_i_illness_015_i_05])), we can put it into isVis but reproschema may not understand it

@NONEOFTHEABOVE has examples as @NONEOFTHEABOVE='12,999,777' this can will probably need to be specified based on the response option everytime it appears. but I'm not sure what this @noneoftheabove is used for later.

@LANGUAGE-CURRENT-SURVEY is related to participant's choice of language. we have language in the UI here but it's for landing page

@HIDDEN everytime we have it, we should set isVis: false

These issues are completely unrelated to my changes. I've been fixing multiple old issues with the code, since I'm getting newer examples, and I can keep doing this (if someone helps me with it), but perhaps this should be done in another PR? However, this seems like critical issue, so if I add to compute or isVis that I see in cvs files no one should expect the schema to work with ui.

I agree that those are complicated but critical issues. we should get this PR merged, release out, and open another PR to fix those issues

@yibeichan
Copy link
Contributor

@djarecka anything else we need to check/update in this PR? If not, I guess we can merge it and move on (to get the release out)

@djarecka
Copy link
Member Author

djarecka commented Jun 10, 2024 via email

@djarecka
Copy link
Member Author

@yibeichan - at the end I added many command to do the script redcap2reproschema, please take a look if these make sense to you.
I decided to explicitly add instruction for all elements to INPUT_TYPE_MAP, and VALUE_TYPE_MAP (previously the default behavior was to not apply any mapping, but that lead the elements that were completely ignored by reproschame) , so if something is not in the dictionaries, you will get an error, and that means that we have to decide how it should be represented in reproschema.

re. compute - maybe I wasn't clear on Sat, but I believe that even the elements in calc would not be interpreted by reproschema-ui, but for now I left it (and added comment, that should be also added to issues)

If that looks goo, one thing we should change for sure, is the path to contextfile

@yibeichan
Copy link
Contributor

@djarecka thanks for the work!! I tested nimh-minimal with a roundtrip: reproschema2redcap and redcap2reproschema. Everything looks good, except the branching logic.
In nimh-minimal, we have a special branching logic case. Whether participants will answer DSM5 youth or DSM5 parents depends on the interview age (an item from the previous activity). We had some problems making the ui work for this case, but @satra found a (temporary) solution for ui. So, it's reasonable that our current (general) reproschema2redcap and redcap2reproschema cannot catch this logic.

I'm going to merge this PR now

@yibeichan yibeichan merged commit 0988c20 into ReproNim:ref/linkml Jun 15, 2024
1 check passed
yibeichan added a commit that referenced this pull request Jun 15, 2024
* adding pydantic model from the reproschema ref/linkml branch

* fix altLabel

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to the model: updating AllowedType, adding MissingType to value, updated to ResponseOption and image

* running pre-commit

* ENH: updating reproschema commands to the new pydantic model (#36)

* add print for testing

* update clear_header

* remove print

* fix order and other errors

* change ui yesno to radio

* fix typo

* update context, field->item, fix isVis

* remove useless due to failed validation

* remove visibility at the item level & remove matrixInfo

* fix choice

* remove identifier

* updating validate command to the new pydantic model

* updating/fixing the tests; updating the model to use CreativeWork; changes in formating

* fix conversion tests

* remove test output

* change test output directory

* final improvments on tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* model version after adding Thing class

* updating model after removing CreativeWork and ImageUrl

* adding tests to initialize the model classes

* fixing load_file; adding write_obj_jsonld function and expanding test_schema

* changing redcap2reproschema to use ned pydantic classes; some small changes to the pydantic model

* changing name from string to landstring with en as language

* fixing jsonld files

* Adding option to return compact schema to load_file

* fixing the protocol jsonld file

* changing reproschema2redcap to use the new model

* adding contectfile to write_obj_jsonld function and improving test; improving compact option for load_file

* fixing reproschema2redcap and tests

* removing file with the context and fixing references to the context_url (for now the link rfom the ref/linkm branch

* updating the reproschema2redcap to work for activity/items from urls

* improving error message for file_load and validate; checking the suffix of the file before treating it as jsonld

* fixing identify_model_class, so Item and Field are treated the same

* fixing reproschema2redcap so it reads responseOptions from another file

* rewriting parts of redcap2reproschema, fixing some bugs[wip]

* fixing compute: removing isvis condition

* fixing process_csv so it doesn't go multiple time through the same condition

* changes to input and value mapping (mapping explicitly or raising errors if not found); fixing choices and adding slider; adding sql to compute types (this does not work properly right now); adding many comments

* adding output for redcap2reproschema command; removing argparse

* model without decimal; revert changes to valueType in the model

* adding migrade command

* fixing multiple issues with redcap2rp and rp2redcap: adding compute, fixing preamble (can be either activity level or issue level

* WIP: addining test to test rp2redcap and redcap2repo using nimh exampl

* fixing paths in the tests, should run now

* ignore .DS_Store files in validation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: yibeichan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename to be consistent

---------

Co-authored-by: Dorota Jarecka <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants