-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get rid of JsonPathFinder and Boxes #58
Conversation
Great. Thanks for the pull request. I will try to come back to it this week( I am currently on vacation without access to my laptop unfortunately:( ) Btw, getting rid of the given struct has boosted performance or it is rather due to redundancy of the layouts? |
I wanted to search in some existing json Additionally i think having a sleek api that makes it easy to reuse code and doesn't encouraging cloning is a good way to make it hard for consumers of this crate to missuse. But for now enjoy the vacation :) |
Yeah. thanks. :) I agree the supplementary costs should be avoided, the struct Maybe yo are right. I will need to recall what the struct does now actually =) |
closes #59 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response. Just came back.
To me looks good. Let's remove the finder.
It has some fmt and clippy failures. You can fix it locally just kicking off the appropriate commands and correct the code |
@besok can you fix those maybe? I would like to use this otherwise I will fork it no problem but since its only clippy fixes its probably done quickly. |
There are merge conflicts since the aforementioned in the PR structure obtained a new field (cfg)! |
Just had a look, it seems like #62 adds a cache for regex, imo this is is not necessary when doing it like in this PR. Simplifies this code and lets the caller do stuff more efficient. I can do a rebase, but we first have to discuss what to do with those caching behavior. |
ff66e94
to
7cd697c
Compare
I don't grasp fully what you mean tbh, can you give an example or expand the answer? |
I'll try to explain, I don't remember the internals from this crate as the PR is a bit older. So mostly #61 complained about the crate beeing slow. And they even said what is slow, recompiling the regex. By looking in this crate, example: // https://github.com/besok/jsonpath-rust/blob/main/src/lib.rs#L150
let json: Value = serde_json::from_str("{}").unwrap();
let path = json.path("$..book[?(@.author size 10)].title").unwrap(); Lets assume the following speed behavior:
line 1 does step 1 When people complain about something beeing slow, they prob do it for alot of things, e.g. have the same json data and search different things inside, or have different json data but search the same thing inside. (or even both, the important stuff is, often one thing stays the same). The way this is currently build does not profit when one thing stays constant, it has to redo step 1 or step 2 in either case. Those are the slow steps. With this PR: let json: Value = serde_json::from_str("{}")?;
let path: JsonPathInst = JsonPathInst::from_str("$..book[?(@.author size 10)].title")?;
let v = find_slice(&path, &json); You see that we need an additional line, one line per step. And you'll notice that the This allows a cunsumer of this librarby, like the person from #61 to precompile the regex easily, and reuse it as often as they want. They can even go multihtreading if they need. The idea is: rather to inject the cache behavior into the whole stack: just make this library work simply, and thus allows devs to reuse (=effectively building their own cache) regex that are already compiled |
This reverts commit a07c7b6.
- JsonPathFinder interface does not really benefit from storing the json or path internally - trying to get rid of the Box<> that is used inside of JsonPathFinder
7cd697c
to
beabd2d
Compare
equal bench with reuse time: [510.30 ns 512.16 ns 514.26 ns] equal bench without reuse time: [21.436 µs 21.456 µs 21.479 µs] regex bench with reuse time: [58.875 µs 58.925 µs 58.975 µs] regex bench without reuse time: [85.324 µs 85.416 µs 85.517 µs] JsonPathInst generation time: [23.988 µs 24.019 µs 24.052 µs]
beabd2d
to
e85553c
Compare
Okay, just had some refactoring and mid into it i noted what we were talking about, the regex is actually executed as part of the search on not when creating the jsonpathinst. thus getting rid of the cache is a regression here. Though anyway maybe I can convince you, rather than building a cache, evaluate the respective regexs on creation of the jsonpathinst. Getting rid of all the boxes with this PR improves the default case on my machine from: But on the other hand, every other operation, that is not a regex, gets a speeup from When combining both we might even see sub us for the regex case |
The introduced cache handles a specific situation. Thus, if a user has either multithreading or a stream of queries with the same regex in the long-living app,
This is an unconnected process or at least I don't see a bond. Still, I would not say the cache and removing boxes and other auxiliary structures is a mutually exclusive process. |
Here is some ugly code that statically compiles the regex, when the compilation time goes up by: while now execution time improves by: this is an other 50-times fold improvement. The only thing i have: It currently only works with static regex, idk if dynamic regex is a thing. |
But that is precisely the case. In your example, you precompile it beforehand outside of the jspath. |
sorry, non-native speaker here. Is this now a good thing that its compiled with the jspath or a bad thing ? |
It is not bad or good. It does not solve the initial problem. You precompile a regular expression outside of the library. But it should be done inside the library because we parse a string into regex inside the library. So, you need something inside the library that will keep track of the regexes that are already compiled. |
in that example: let path = JsonPathInst::from_str('$.[?(@.field == '[a-zA-Z]')]').unwrap();
for _ in 0..1000 {
// actually 1000 different jsons:
let json = json!({
"field":"abcXYZ",
});
let x = jsonpath_rust::find(&path, &json);
} we can solve the problem with regex compilation in 2 ways:
Or do I get it wrong somehow ? |
I think in general you are correct.
Yeah, that is how it is implemented now.
Maybe, I missed something but a couple of points.
and so on. The queries will be different and the regex will be calculated again. |
Good point, I haven't though about that yet. I don't necessarily want the user to build a cache in the sense of a HashMap. Mentioning different Json-Paths but with the same regex inside is truely something we should consider. It now gets a bit tricky, because we have to think how a user of this crates uses it and find a good tradeoff. IMO I would prob see a user have some I would probably expect most use cases are either: The application has a fixed number of paths and many scaling data, or the application has a fixed number of data and is checking many paths. In both cases, beeing able to reuse via reference would provide a huge benefit (see above -98,9%, that 100x faster). That would allow, even if there are hundreds of jsonpath where the same regex is used, we would still be faster in the end. The only exception is probably a application where every jsonpath, without exception is different from each other but share the same regex |
Very good point. I don't know why but the idea of caching the whole jspath slipped through my fingers. I will think about it in the evening a bit, searching for pitfalls. |
I have taken a look to the benches and truly it seems the benefit of having a regex cache is leveled by the alternative of caching jspathinst itself. Thanks for the great PR! |
Hi @besok I started using your crate and got into performance issues, now I did some looks into it and tried to simplify the interface a bit.
Removed the JsonPathFinder and just had the methods there as stand-alone methods, like common in the rust env.
Want to get your opinion on this, what do you think ?