Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simd support? #368

Closed
jqnatividad opened this issue May 3, 2022 · 7 comments
Closed

simd support? #368

jqnatividad opened this issue May 3, 2022 · 7 comments

Comments

@jqnatividad
Copy link
Contributor

Already, jsonschema-rs is quite performant.

However, have you looked into using crates like simd-json, simdutf8 to make it even faster?

@Stranger6667
Copy link
Owner

Yes,

I am actively looking into these things and wanted to publish a design document to get feedback on implementation. It is also somehow a roadmap to 1.0 and will contain at least the following areas:

  • Keywords layout. As described in Improve validators graph #212. I started a complete rewrite in a separate repo and just this change yields ~50% validation time reduction in some benchmarks + simplifies the code dramatically (it also uses enum_dispatch). Though it is incomplete but unlocks a lot - for example, there will be no need for RwLock in $ref as it will be possible to evaluate them at the compilation phase.
  • Custom input types. It seems like the way to support the crates you mentioned + other external types (like Python ones). Not sure what would be the best way to do so :( my attempts to wrap serde_json::Value without sacrificing too much were not successful.
  • Real error iterator. Now there are tons of unnecessary allocations on each validate call + all the flat_map calls are responsible for long compile times (according to llvm-lines). I'd like to have some tree iterator that doesn't allocate intermediate vectors - not sure about the right way to suspend/resume such a process. Maybe a separate state machine transitions table would work for this.
  • Avoid extra costs of SchemaNode - it is not needed for is_valid and validate calls, but adds extra overhead.

I expect to have it in a few days and it is roughly my roadmap for this lib :) I'd appreciate if you could share your thoughts on this or share your use case for integrating the crates you mentioned

@jqnatividad
Copy link
Contributor Author

Sorry I didn't get back earlier, but thanks for your thorough response!

I don't know enough about your implementation to cogently comment on your points, but the details I can tease out indicates that there's a lot of headroom the library can exploit to squeeze more performance.

I'm looking forward to the design document!

What I can contribute are my use-cases.

Currently, I'm using jsonschema-rs to validate CSV files (and that's why I originally asked about #339 ), and after using rayon, the performance is already quite impressive.

dathere/qsv#164

But as the flamegraph shows, any incremental performance from jsonschema will further accelerate qsv's validate cmd.

I plan to leverage the qsv validate command in another project - https://github.com/dathere/datapusher-plus to validate CSV files before they are uploaded to CKAN.

@manuschillerdev
Copy link

@Stranger6667

#212. I started a complete rewrite in a separate repo and just this change yields ~50% validation time reduction in some benchmarks + simplifies the code dramatically (it also uses enum_dispatch). Though it is incomplete but unlocks a lot - for example, there will be no need for RwLock in $ref as it will be possible to evaluate them at the compilation phase.

That sounds really interesting! Is that repo publically available?

@Stranger6667
Copy link
Owner

@manuschillerdev I added it as a separate crate here - #373 :) It is a prototype, but ref resolving is more or less ready

Btw, @jqnatividad thanks for sharing your use case! I hope that soon we all can benefit from faster validation! :)

the changes though are quite large and I’ll appreciate any help there :)

@jqnatividad
Copy link
Contributor Author

@Stranger6667 I'll start testing the jsonschema-csr prototype and will let you know my findings!

I need to update qsv's benchmarks soonish and I'll be sure to include the prototype in it when I do.

And once I grok the internals, you can be sure I'll try to help as best as I can.

@Stranger6667
Copy link
Owner

@jqnatividad Thank you! The currently submitted version is not working yet, but I am slowly working on it :)

@Stranger6667
Copy link
Owner

Closing this for now, the next release will use uuid-simd for uuid validation. Otherwise, I'll track performance improvements in separate issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants