Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex bad performance #61

Closed
catalinstochita opened this issue Feb 2, 2024 · 5 comments · Fixed by #62
Closed

Regex bad performance #61

catalinstochita opened this issue Feb 2, 2024 · 5 comments · Fixed by #62
Labels
enhancement New feature or request

Comments

@catalinstochita
Copy link
Contributor

catalinstochita commented Feb 2, 2024

Hey. I'm using this library at work and I have noticed some slowness when using regex, looks like creating a Regex is somehow slow. I have tested the theory using a global map for storing the compiled regex and it improved the speed ~50x. The implementation is not what we want, we should create the regex when creating the jsonpath and use it directly. I'm not sure how we can do that.

Here is the changes I made for testing:
https://github.com/catalinstochita/jsonpath-rust

Let me know if you have an idea of how we can do that. Thank you @besok

@besok
Copy link
Owner

besok commented Feb 2, 2024

Hi, sure. I will take a look shortly.

@besok
Copy link
Owner

besok commented Feb 4, 2024

Yeah, the regular expressions are heavy and the general advise is to avoid recompiling them as much as possible.

Thus, I can't see any other way but to solve it with something like a cache with the precompiled expressions.

So, essentially, the library can only provide the option to accept the regex cache and use it.

struct JsPathRegexCache {
    cache: Arc<Mutex<HashMap<String, Regex>>>,
}
pub struct JsonPathConfig {
    pub with_regex_cache: bool,
    pub regex_cache: JsPathRegexCache,
}

fn example(global_cache:JsPathRegexCache) {
    let json = Box::new(json!({}));
    let cfg = JsonPathConfig::with_cache(global_cache)
    let _v = (cfg, json).path("$.[?(@.author ~= '.*(?i)d\\(Rees\\)')]")
        .expect("the path is correct");
}

Obviously, it will improve the performance only in the long run. If the regexes are always different it will not help.

I scribbled an example of how it may be in this branch

In the given benchmark the duration gets boosted 5 times to roughly 25 µs on my laptop.
Looking into this thread where eventually the duration was 1,5 seems more or less plausible.

Let me now, if this approach works and I will implement it shortly.

@catalinstochita
Copy link
Contributor Author

Yes, is working fine. It would have been nice if serde_json::Value had a Regex field, I think in that case we would just added a "regex" rule and parse it directly, but this method works too

@besok
Copy link
Owner

besok commented Feb 7, 2024

Yeah, great. I will implement it in a week or two.

It would have been nice if serde_json::Value had a Regex field, I think in that case we would just added a "regex" rule and parse it directly, but this method works too

Well, it is beyond this repo unfortunately but even if not to me it would be not clear how it would solve the problem assuming:

  • one Value can have different regexes to be parsed with
  • one regex can be applied to the different Value instances
  • this will transfer the given problem of having a cache from regexes to having a cache of Values eventually which seems to be less static and harder to maintain

But maybe, I just missed something.

@besok
Copy link
Owner

besok commented Feb 7, 2024

So, the story:
Add a configuration for jspath which will accept the cache for regexes.
In the future, the configuration can be enriched with the other parameters.

The cache should be mutable(assuming refcell) and capable of working in multithreading (assuming Arc)
the regex cache should be used when the library calculates the regex operator and either create a new one and place it in the cache or take the existing one from the cache.
if you want, feel free to take it on. :)
I will be able to come back to it in a week or two.

@besok besok added the enhancement New feature or request label Feb 7, 2024
@besok besok mentioned this issue Feb 19, 2024
@besok besok closed this as completed in #62 Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants