-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formal specification of Circe cohort definitions #156
Comments
Not at the moment, but this could be provided by generating a JavaDoc from this class: https://github.com/OHDSI/circe-be/blob/master/src/main/java/org/ohdsi/circe/cohortdefinition/CohortExpression.java. This class definition should be considered the 'formal specification'. We are considering describing the Circe CohortExpression object with a json schema, such that you could use something like this R package that would allow you to provide a JSON document and check for validation issues related to a schema specification. To do what you want (is a json object 'valid' for a cohort expression) you can We'll leave this issue open as a ask to convert the cohort expression object model into a json schema specification. |
@ablack3 , would something like the attached file work? I'm not happy with some of the generated names, but it will validate a json file, I think. |
Yes I think so. I was mostly interested for improving my understanding of what was allowed in circe cohort definitions but was also thinking of writing an R function to validate cohort JSON which might be useful for Capr (assuming it does not already exist). Thanks! |
So, with this schema, you can use the R package jsonvalidate to see if the json you generate conforms with the schema I provided. Want to give it a shot? |
Oh great. Yes I'll give it a try with the phenotype library cohorts. |
I think it works! All 797 phenotype library cohorts pass the test. # validate cohort json files
library(magrittr)
schemaPath <- "C:/Users/Adam Black/Google Drive/OHDSI_R_packages/PhenotypeLibrary/work/CohortExpression.json"
validateCohortJson <- jsonvalidate::json_validator(schemaPath)
pkgDir <- system.file(package = "PhenotypeLibrary")
phenotypeIds <- list.files(pkgDir) %>%
stringr::str_subset("\\d+")
jsonFiles <- purrr::map(phenotypeIds, ~paste0(glue::glue("{pkgDir}/{.}/"), list.files(glue::glue("{pkgDir}/{.}")))) %>%
unlist() %>%
stringr::str_subset("\\.json")
validationResult <- purrr::map_lgl(jsonFiles, validateCohortJson)
mean(validationResult)
#> [1] 1 Created on 2021-02-12 by the reprex package (v0.3.0) |
you should also check that it identifies an invalid json. Make a copy of one that uses ConditionOccurrence , and change the expression:
to:
That should fail validation, I think. |
That passes unfortunately :/ |
Hnmm, so...not sure what to think here. On one hand that could just look like another field that isn't invalid, just unused. I am not sure if it should be strick that every field has to be defined, but we make it so that older defs work fine in new versions (the older defs won't have the newer fields defined)...so...I need to think about what we need to do to validate... Another test is: find a conceptId and set the json from a number to a string (like codesetId: 1 -> codesetId: 'invalid'. |
Hi, |
I did a few small checks that I think should all be considered invalid. All three of the schema versions gave identical results.
|
I'll run some examples too, and maybe open an issue on the rjson validator repo with some simple test cases to demonstrate where we think it should result in an invalid eval. |
Could the official version of this schema into the repository so that it could be referenced (and bug reports made?). Also, I noticed that some of the fields in the JSON out there are missing, while the "type" is not marked as being optional. For example, "Title" and "cdmVersionRange" is missing in many JSONs in the wild. |
Is there a document that formally specifies the json representation of circe cohorts and possibly a function that takes in json and returns true or false depending on if the json conforms to the circe specification?
The text was updated successfully, but these errors were encountered: