Update bulk_validation_referential_integrity_check
notebook to concur with refscan
(no false positives)
#796
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this branch, I updated
docs/nb/bulk_validation_referential_integrity_check.ipynb
so that it prefers a union of a LinkML slot'sany_of
ranges, when present, to the value ofrange
(cf.refscan
's get_names_of_classes_in_effective_range_of_slot). Furthermore, I ditch use ofconcurrent.futures.ThreadPoolExecutor
, as its usage results in inconsistent collections of errors from run to run. Tuning efficiency as needed will be raised as a separate issue.Details
Now returns 33 "not found" errors and zero "invalid type" errors on using nmdc-schema v11.1.0 on
/global/cfs/projectdirs/m3408/nmdc-mongodumps/dump_nmdc-prod_2024-11-25_20-12-02/nmdc
, in alignment with the correspondingrefscan_report.20241126_083003_UTC.schema_v11.1.0.nmdc.violations.tsv
....
Related issue(s)
Fixes #576
...
Related subsystem(s)
docs
directory)Testing
I tested these changes by...
Documentation
docs
directory)Maintainability
study_id: str
)# TODO
or# FIXME
black
to format all the Python files I created/modified