Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of referencing library vs deprecated jsonschema.RefResolver is very bad when there are a lot of references in schema #178

Open
nathan-stender opened this issue Sep 25, 2024 · 3 comments

Comments

@nathan-stender
Copy link

nathan-stender commented Sep 25, 2024

Hello!

I have a library that formats scientific data into a JSON schema called the Allotrope Standard Model (ASM)

The validation schemas are fairly large and complicated compared to other schemas I've seen in discussion boards, and are very modular, meaning there are a lot of references. In allotropy we store the ASM schemas directly, and remove all remote references, replacing them with local references under $defs.

We are finding that validating against the schemas using jsonschema version 4.18.0 takes ~20x longer than 4.17.0.

As a concrete example:

Validating this data: https://raw.githubusercontent.com/Benchling-Open-Source/allotropy/refs/heads/main/tests/parsers/moldev_softmax_pro/testdata/MD_SMP_luminescence_endpoint_example08.json

Against this schema: https://github.com/Benchling-Open-Source/allotropy/blob/main/src/allotropy/allotrope/schemas/adm/plate-reader/REC/2024/06/plate-reader.schema.json

takes ~3.5s on 4.17.0 and ~55s on 4.18.0

This translates to a runtime for all 26 tests in tests/parsers/moldev_softmax_pro of ~30s in 4.17.0 to ~6m in 4.18.0

@Julian
Copy link
Member

Julian commented Sep 25, 2024

Hey there, I'm happy to have a look at this at some point, but is there a reason you're benchmarking against such an old version? Lots has changed since 4.18, so it'd be good if you shared numbers which were on 4.23.

@nathan-stender
Copy link
Author

nathan-stender commented Sep 25, 2024

Sorry, I didn't mention that I tested on every version between 4.18 and 4.23 to see if any had better performance. None of the versions past 4.18 improve the performance noticeably.

On 4.23, the results are actually a bit worse:

For the single test: 55s
For the 26 tests: 6m33s

@sameeul
Copy link

sameeul commented Nov 14, 2024

We have also experienced similar performance issue in one of our tool after switching from RefResolver to this library. This is the commit in our library: PolusAI/workflow-inference-compiler#287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants