Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve validators graph #212

Closed
Stranger6667 opened this issue May 3, 2021 · 1 comment
Closed

Improve validators graph #212

Stranger6667 opened this issue May 3, 2021 · 1 comment

Comments

@Stranger6667
Copy link
Owner

Stranger6667 commented May 3, 2021

The core idea behind this library is to store all keywords as a tree similar to the original schema JSON object. However, the implementation is not as efficient as I'd like it to be. Primarily, it has the following disadvantages:

  • Each node's siblings are stored in a separate vector, which leads to a very fragmented graph that negatively affects cache locality;
  • It doesn't allow cycles. The primary use case is the $ref keyword, where some compilation logic is happening during validation. These things are cached, though, but the cost of RWLock is visible in flame graphs. Also, when logically the same validator is called recursively -> some nodes are re-created, but theoretically, they can be re-used.

This issue is the starting point to rethink the implementation and maybe to implement something better.

Here are some thoughts I have in mind:

  • It might be possible to use an arena to allocate all boxed validators, then navigating could be done with indexes (maybe);
  • An interesting article on using indexes in graphs;
  • It might be nice to have some tool to display the validators graph.
  • More ideas

Later I'll draw the current implementation, so all the overhead is more visible.

@Stranger6667
Copy link
Owner Author

Stranger6667 commented Dec 29, 2021

At the moment, I think that the following layout will be a good step in this direction:

  • Store validators in a vector
  • Store metadata (relative_path / absolute_path) in a separate vector
  • Pass &[BoxedValidator] to each validator in is_valid / validate / apply so each validator can call dependent validators
  • Use indexes to access dependent validators/metadata (as a single one or a range)

This should improve cache locality as all validators will be in the same vector + remove costs for following references between the validation nodes (it will be array access by index) + is_valid won't have extra costs for loading larger SchemaNode as it will be just boxed validator

As the next step, it will be easier to evaluate $ref during the compilation phase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant