Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

templated schema-as-yaml modifier #247

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

turbomam
Copy link
Member

@turbomam turbomam commented Oct 30, 2024

To replace the vast majority of the uses of yq in this and potentially other schemas

example run:

poetry run python src/nmdc_submission_schema/scripts/qy2.py \
    --schema local/with_shuttles.yaml 
    --config qy2-config.tsv 
    --output qy2-output.yaml 

current weaknesses:

  • highly repetitive linear code. needs refactoring. src/nmdc_submission_schema/scripts/qy2.py
  • are there opportunities for integrating with the modifications or validation updating phases of sheets_and_friends' modifications_and_validation?
  • config file qy2-config.tsv in repo root
  • have been sending output to repo root. will obviously change that in the makefile, but in the mean time it leads to slow re-indexing by PyCharm (put the output in a non-indexed directory!)
  • several different kinds of empty-ish values are currently in the configuration sheet, so testing for "", "NULL" and None
  • the active column in the config file is only getting tested for 'TRUE' after upper-casing. not testing for other true-ish values

opportunities

  • convert more kinds of dicts to simple collection format, like prefixes
  • remove types that could be imported from linkml and then assert the import
    • there are some things that aren't discovered as dependencies in the shuttle phase, like types, so those are in the base schema file

probably won't be able to support setting a list of examples like

yq '(.classes.[].slot_usage.[] | select(.name=="chem_administration") | .examples) = [{"value": "agar [CHEBI:2509];2018-05-11|agar [CHEBI:2509];2018-05-22"}, {"value": "agar [CHEBI:2509];2018-05"}]'

yq -i '(.classes.[].slot_usage.[] | select(.string_serialization=="{text};{float} {unit}" and .multivalued == true ) | .pattern) = "^([^;\t\r\x0A]+;[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)? [^;\t\r\x0A]+\|)*([^;\t\r\x0A]+;[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)? [^;\t\r\x0A]+)$'

probably won't be able to act on multiple criteria like

yq '(.classes.[].slot_usage.[] | select(.range == "QuantityValue" and .multivalued == true)  | .pattern) = "^([-+]?[0-9]*\.?[0-9]+ +\S.*\|)*([-+]?[0-9]*\.?[0-9]+ +\S.*)$"'

probably won't be able to assign or migrate rules like

yq eval-all \
    'select(fileIndex==1).classes.JgiMgInterface.rules = select(fileIndex==0).classes.Biosample.rules | select(fileIndex==1)' \
    local/nmdc.yaml $@.raw | cat > $@.raw2
yq eval-all \
    'select(fileIndex==1).classes.JgiMtInterface.rules = select(fileIndex==0).classes.Biosample.rules | select(fileIndex==1)' \
    local/nmdc.yaml $@.raw2 | cat > $@.raw

in addition to that, probably won't be able to act on wildcards

yq -i 'del(.classes.JgiMgInterface.rules.[] | select(.title == "rna*"))' $@.raw
yq -i 'del(.classes.JgiMtInterface.rules.[] | select(.title == "dna*"))' $@.raw

@turbomam
Copy link
Member Author

turbomam commented Oct 30, 2024

rules that remain in src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml without using src/nmdc_submission_schema/scripts/qy2.py:

might be essentially the same?

classes.JgiMgInterface

    rules:
      - preconditions:
          slot_conditions:
            dna_cont_well:
              name: dna_cont_well
              pattern: .+
        postconditions:
          slot_conditions:
            dna_cont_type:
              name: dna_cont_type
              equals_string: plate
        description: DNA samples shipped to JGI for metagenomic analysis in tubes can't have any value for their plate position.
        title: dna_well_requires_plate
      - preconditions:
          slot_conditions:
            dna_cont_type:
              name: dna_cont_type
              equals_string: plate
        postconditions:
          slot_conditions:
            dna_cont_well:
              name: dna_cont_well
              pattern: ^(?!A1$|A12$|H1$|H12$)(([A-H][1-9])|([A-H]1[0-2]))$
        description: DNA samples in plates must have a plate position that matches the regex. Note the requirement for an empty string in the tube case. Waiting for value_present validation to be added to runtime
        title: dna_plate_requires_well

classes.JgiMtInterface

    rules:
      - preconditions:
          slot_conditions:
            rna_cont_well:
              name: rna_cont_well
              pattern: .+
        postconditions:
          slot_conditions:
            rna_cont_type:
              name: rna_cont_type
              equals_string: plate
        description: RNA samples shipped to JGI for metagenomic analysis in tubes can't have any value for their plate position.
        title: rna_well_requires_plate
      - preconditions:
          slot_conditions:
            rna_cont_type:
              name: rna_cont_type
              equals_string: plate
        postconditions:
          slot_conditions:
            rna_cont_well:
              name: rna_cont_well
              pattern: ^(?!A1$|A12$|H1$|H12$)(([A-H][1-9])|([A-H]1[0-2]))$
        description: RNA samples in plates must have a plate position that matches the regex. Note the requirement for an empty string in the tube case. Waiting for value_present validation to be added to runtime
        title: rna_plate_requires_well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Identify which types of schema modifications are currently performed by yq
1 participant