Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Handle anyOf in JSON Schema property #379

Conversation

eecavanna
Copy link
Collaborator

Fixes #378

@eecavanna eecavanna changed the title fix: Handle anyOf in JSON Schema property fix: Handle anyOf in JSON Schema property Nov 13, 2023
@eecavanna eecavanna marked this pull request as ready for review November 13, 2023 20:08
@eecavanna eecavanna requested a review from dwinston November 13, 2023 20:08
@eecavanna
Copy link
Collaborator Author

I've drafted this function, but haven't tested it yet. I'm trying to get unit tests or doctests running locally.

@dwinston
Copy link
Collaborator

Grep the codebase for additional uses of “$ref” in .py files. I think there are a few other instances that we want to convert to more robust access.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Nov 13, 2023

I will do that and add fixes to this PR.


Update: That's turning out to be a bit of a rabbit hole (since some things, and the things that use those things, were written under the assumption that a collection always maps to at most one class).

@eecavanna
Copy link
Collaborator Author

eecavanna commented Nov 14, 2023

@dwinston, I don't know how I can adapt this statement to work in the new "a collection can be associated with multiple classes" situation. It's within a 175-line function in the changesheet code.

else:
class_name = class_name_dict[collection_name_to_class_name[collection_name]]

In this PR branch, I have arbitrarily chosen to use the first class in the list of classes. 🤷

        else:
            # FIXME: As of nmdc-schema v9.1.0 (maybe earlier), a collection name can map to multiple class names.
            #        In the case of multiple class names, which one do you want to assign to `class_name`?
            #        Here, I have arbitrarily chosen to use the first class name in the list.
            class_names = collection_name_to_class_names[collection_name]
            class_name = class_name_dict[class_names[0]]

Source: https://github.com/microbiomedata/nmdc-runtime/pull/379/files#diff-a981e2387918aa46e3575eddf8d3b3efd43dc78ca8be2a03f3b974e3887d6368R171-R176

@eecavanna
Copy link
Collaborator Author

@dwinston, I adapted this function and the associated search.html page, to work in the new "a collection can be associated with multiple classes" situation; with the caveat that the web page will ignore any classes after the first one.

def documentation_links(jsonschema_dict, collection_names):
rv = {"Activity": []}
for cn in collection_names:
last_part = jsonschema_dict["$defs"]["Database"]["properties"][cn]["items"][
"$ref"
].split("/")[-1]
entity_attrs = list(jsonschema_dict["$defs"][last_part]["properties"])
if last_part in ("Biosample", "Study", "DataObject"):
assoc_path = [cn]
else:
assoc_path = ["activity_set", cn]
rv = assoc_in(
rv,
assoc_path,
{
"collection_name": cn,
"entity_url": "https://microbiomedata.github.io/nmdc-schema/"
+ last_part,
"entity_name": last_part,
"entity_attrs": sorted(
[
{
"url": f"https://microbiomedata.github.io/nmdc-schema/{a}",
"attr_name": a,
}
for a in entity_attrs
],
key=itemgetter("attr_name"),
),
},
)
return rv

@eecavanna
Copy link
Collaborator Author

Hi @dwinston, I finished making all the changes to this branch that I expect to make. I want to know what you think about what I said in these two comments:

I'm OK with you making additional commits to this branch.

@dwinston
Copy link
Collaborator

thanks for tackling this, @eecavanna. I don't love the added complexity. I'd like to try and identify a more elegant solution than this band-aid, realizing that this bug is blocking folks, so if I can't identify a better route soon, I'll go with this (after careful review). Again, thank you.

mbthornton-lbl and others added 22 commits November 20, 2023 16:24
unauthenticated.

closes #385
* Consolidate workflows for building docker images and deploying to Spin into one workflow

* Remove docker-build.sh in favor of letting GitHub Actions handle Docker build and push

* Update Release Process doc with info about initiating via GitHub Releases

* Replace Rancher-Action with generic HTTP call

* Replace release event with tag push event, which is required for semver metadata

* Remove unnecessary pr event

* Add more release instructions
This reverts commit 1b2372d.
Copy link
Collaborator

@dwinston dwinston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a more straightforward way to handle the appearance of one-to-many collection-to-class relationships, and I think we should shoot for straightforward for now.

@dwinston dwinston merged commit 30e3838 into main Nov 20, 2023
@eecavanna eecavanna self-assigned this Nov 20, 2023
@eecavanna eecavanna added the bug Something isn't working label Nov 20, 2023
@eecavanna eecavanna deleted the 378-error-keyerror-ref-occurred-on-dagster-after-using-metadatajsonsubmit-endpoint branch November 22, 2023 00:42
dwinston added a commit that referenced this pull request Nov 28, 2023
* fix: Handle `anyOf` in JSON Schema property

* Indicate function return value type

* Refactor function and add comments

* fix: Prefix class name with `nmdc:`

* Implement helper function to process both single-ref and multi-ref specs

* Document prefix functionality

* Fix punctuation in comment

* Update dictionary and function to accommodate multiple classes per collection

* WIP: Update doc link maker to accommodate collections that map to multiple classes

* Clarify variable names

* Add comments in an attempt to clarify code

* Delete commented-out code that doesn't accommodate multi-class collections

* Add tests covering some corner cases

* Fix inaccurate type hint

* Clarify docstring

* Replace reference to nonexistent dict and implement preliminary patch

* Make the collection name bold on the search page

* Update search page to account for collections mapping to multiple classes

* Remove redundant type hints

* style: black format

* panic on no-type given

* add script and api function

* update script

* Refactor runtime client methods to raise for status and parse and return results

* handle omics processing records

* update docstring

* update to include correct prefix

* update to use use new insdc_bioproject_identifiers slot on omics_processing

* style: black format

* add typecodes enpoint (#386)

unauthenticated.

closes #385

* update .gitpod.yml

* add sshproxy.sh for nersc tunneling

* update make cmd

* add gitpod affordance

* add gitpod dockerfile

* update gitpod stuff

* rename

* update Makefile

* fix

* gitpod: pull dev mdb

* fix

* fix make target

* Separate dev and production deployments in GitHub workflow (#382)

* Consolidate workflows for building docker images and deploying to Spin into one workflow

* Remove docker-build.sh in favor of letting GitHub Actions handle Docker build and push

* Update Release Process doc with info about initiating via GitHub Releases

* Replace Rancher-Action with generic HTTP call

* Replace release event with tag push event, which is required for semver metadata

* Remove unnecessary pr event

* Add more release instructions

* style: fix, and elaborate a bit

* Revert stuff

This reverts commit 1b2372d.

* style: fix, and elaborate a bit

---------

Co-authored-by: eecavanna <[email protected]>
Co-authored-by: Donny Winston <[email protected]>
Co-authored-by: Michael Thornton <[email protected]>
Co-authored-by: Donny Winston <[email protected]>
Co-authored-by: Jing Cao <[email protected]>
Co-authored-by: Patrick Kalita <[email protected]>
dwinston added a commit that referenced this pull request Nov 28, 2023
* add stuff

* Update json.dump() to json.dumps() in user.py

* Mix in orcid jwt flow with client_credentials flow

closes #333

* update .gitpod.yml

* add sshproxy.sh for nersc tunneling

* update make cmd

* add gitpod affordance

* add gitpod dockerfile

* update gitpod stuff

* rename

* update Makefile

* fix

* gitpod: pull dev mdb

* fix

* fix make target

* Separate dev and production deployments in GitHub workflow (#382)

* Consolidate workflows for building docker images and deploying to Spin into one workflow

* Remove docker-build.sh in favor of letting GitHub Actions handle Docker build and push

* Update Release Process doc with info about initiating via GitHub Releases

* Replace Rancher-Action with generic HTTP call

* Replace release event with tag push event, which is required for semver metadata

* Remove unnecessary pr event

* Add more release instructions

* fix: Handle `anyOf` in JSON Schema property (#379)

* fix: Handle `anyOf` in JSON Schema property

* Indicate function return value type

* Refactor function and add comments

* fix: Prefix class name with `nmdc:`

* Implement helper function to process both single-ref and multi-ref specs

* Document prefix functionality

* Fix punctuation in comment

* Update dictionary and function to accommodate multiple classes per collection

* WIP: Update doc link maker to accommodate collections that map to multiple classes

* Clarify variable names

* Add comments in an attempt to clarify code

* Delete commented-out code that doesn't accommodate multi-class collections

* Add tests covering some corner cases

* Fix inaccurate type hint

* Clarify docstring

* Replace reference to nonexistent dict and implement preliminary patch

* Make the collection name bold on the search page

* Update search page to account for collections mapping to multiple classes

* Remove redundant type hints

* style: black format

* panic on no-type given

* add script and api function

* update script

* Refactor runtime client methods to raise for status and parse and return results

* handle omics processing records

* update docstring

* update to include correct prefix

* update to use use new insdc_bioproject_identifiers slot on omics_processing

* style: black format

* add typecodes enpoint (#386)

unauthenticated.

closes #385

* update .gitpod.yml

* add sshproxy.sh for nersc tunneling

* update make cmd

* add gitpod affordance

* add gitpod dockerfile

* update gitpod stuff

* rename

* update Makefile

* fix

* gitpod: pull dev mdb

* fix

* fix make target

* Separate dev and production deployments in GitHub workflow (#382)

* Consolidate workflows for building docker images and deploying to Spin into one workflow

* Remove docker-build.sh in favor of letting GitHub Actions handle Docker build and push

* Update Release Process doc with info about initiating via GitHub Releases

* Replace Rancher-Action with generic HTTP call

* Replace release event with tag push event, which is required for semver metadata

* Remove unnecessary pr event

* Add more release instructions

* style: fix, and elaborate a bit

* Revert stuff

This reverts commit 1b2372d.

* style: fix, and elaborate a bit

---------

Co-authored-by: eecavanna <[email protected]>
Co-authored-by: Donny Winston <[email protected]>
Co-authored-by: Michael Thornton <[email protected]>
Co-authored-by: Donny Winston <[email protected]>
Co-authored-by: Jing Cao <[email protected]>
Co-authored-by: Patrick Kalita <[email protected]>

* Update build-and-release-to-spin.yml

* fix: Update build-and-release-to-spin.yml

* Replace illegal variable defined using other variable (#389)

Co-authored-by: Donny Winston <[email protected]>

* Do full depth checkout so setuptools-scm can detect version

* Rever to using env context for variables

* Use local path context when running docker build so that it knows about git tags

* Actually push the docker image, but not if running in a fork

* Fix use of boolean var

* Env context only available in `steps.if`

* Update .dockerignore

* Update main.py to display scm version

* Allow study metadata not captured in submission portal to be passed to SubmissionPortalTranslator

* Allow passing doi category to SubmissionPortalTranslator, use dataset_doi by default

* Connect additional study translator parameters to Dagster op inputs

* Allow additional Biosample metadata to be passed in via external CSV file

* Add PyPI URL and elaborate on manual publishing process

* Add related links section to `README.md`

* ensure no w3id.org loop; use data portal API (#403)

closes #402

* Mix in orcid jwt flow with client_credentials flow

closes #333

* style: rm print statement

* style: add docstring

* style: add commentary

* style: rm print-debugging

* fix: rm unneeded import

---------

Co-authored-by: Jing Cao <[email protected]>
Co-authored-by: Patrick Kalita <[email protected]>
Co-authored-by: eecavanna <[email protected]>
Co-authored-by: eecavanna <[email protected]>
Co-authored-by: Michael Thornton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Error KeyError: '$ref' occurred on Dagster after using /metadata/json:submit endpoint
5 participants