get rid of JsonPathFinder and Boxes #58

xMAC94x · 2023-12-05T17:38:38Z

Hi @besok I started using your crate and got into performance issues, now I did some looks into it and tried to simplify the interface a bit.
Removed the JsonPathFinder and just had the methods there as stand-alone methods, like common in the rust env.

Want to get your opinion on this, what do you think ?

besok · 2023-12-06T10:28:23Z

Great. Thanks for the pull request. I will try to come back to it this week( I am currently on vacation without access to my laptop unfortunately:( )

Btw, getting rid of the given struct has boosted performance or it is rather due to redundancy of the layouts?

xMAC94x · 2023-12-06T20:22:25Z

I wanted to search in some existing json Value however i can't pass a &Value to the struct, so i would need to copy it.
One could think about extending the struct to also allow & but its way easier to get rid of it entirely, it's not doing too much anyways.

Additionally i think having a sleek api that makes it easy to reuse code and doesn't encouraging cloning is a good way to make it hard for consumers of this crate to missuse.
Ideally it should be straight forward to use, and the straight forward approach should come with no hidden cost.

But for now enjoy the vacation :)

besok · 2023-12-07T02:38:31Z

Yeah. thanks. :)

I agree the supplementary costs should be avoided, the struct JsonPathFinder was created simply intending to use it directly. like having two strings (JSON and js path) and calculating the path "on a fly".

Maybe yo are right. I will need to recall what the struct does now actually =)

besok · 2023-12-07T02:40:39Z

closes #59

besok

Sorry for the late response. Just came back.
To me looks good. Let's remove the finder.

besok · 2023-12-23T11:25:36Z

It has some fmt and clippy failures. You can fix it locally just kicking off the appropriate commands and correct the code

Norlock · 2024-05-15T10:43:25Z

@besok can you fix those maybe? I would like to use this otherwise I will fork it no problem but since its only clippy fixes its probably done quickly.

besok · 2024-05-15T21:34:37Z

There are merge conflicts since the aforementioned in the PR structure obtained a new field (cfg)!
I can take a look at the weekend and either resolve the conflicts or discard the pr :)

xMAC94x · 2024-05-17T10:25:32Z

Just had a look, it seems like #62 adds a cache for regex, imo this is is not necessary when doing it like in this PR.
You dont need a cache because you just use the existing regex, and this can cache the regex at the caller.

Simplifies this code and lets the caller do stuff more efficient.

I can do a rebase, but we first have to discuss what to do with those caching behavior.

besok · 2024-05-17T10:35:47Z

You dont need a cache because you just use the existing regex, and this can cache the regex at the caller.

I don't grasp fully what you mean tbh, can you give an example or expand the answer?

xMAC94x · 2024-05-17T16:26:11Z

I'll try to explain, I don't remember the internals from this crate as the PR is a bit older.

So mostly #61 complained about the crate beeing slow. And they even said what is slow, recompiling the regex.

By looking in this crate, example:

// https://github.com/besok/jsonpath-rust/blob/main/src/lib.rs#L150
let json: Value = serde_json::from_str("{}").unwrap();
let path = json.path("$..book[?(@.author size 10)].title").unwrap();

Lets assume the following speed behavior:

creating a json object: slow
creating a json path object: slow
actually searching: fast

line 1 does step 1
line 2 does step 2 and 3 combined.

When people complain about something beeing slow, they prob do it for alot of things, e.g. have the same json data and search different things inside, or have different json data but search the same thing inside. (or even both, the important stuff is, often one thing stays the same).

The way this is currently build does not profit when one thing stays constant, it has to redo step 1 or step 2 in either case. Those are the slow steps.

With this PR:

let json: Value = serde_json::from_str("{}")?;
let path: JsonPathInst = JsonPathInst::from_str("$..book[?(@.author size 10)].title")?;

let v = find_slice(&path, &json);

You see that we need an additional line, one line per step. And you'll notice that the find_slice actually just takes references, that means it does not consume the data as its only accessed read-only.

This allows a cunsumer of this librarby, like the person from #61 to precompile the regex easily, and reuse it as often as they want. They can even go multihtreading if they need.

The idea is: rather to inject the cache behavior into the whole stack: just make this library work simply, and thus allows devs to reuse (=effectively building their own cache) regex that are already compiled

This reverts commit a07c7b6.

- JsonPathFinder interface does not really benefit from storing the json or path internally - trying to get rid of the Box<> that is used inside of JsonPathFinder

equal bench with reuse time: [510.30 ns 512.16 ns 514.26 ns] equal bench without reuse time: [21.436 µs 21.456 µs 21.479 µs] regex bench with reuse time: [58.875 µs 58.925 µs 58.975 µs] regex bench without reuse time: [85.324 µs 85.416 µs 85.517 µs] JsonPathInst generation time: [23.988 µs 24.019 µs 24.052 µs]

xMAC94x · 2024-05-17T17:38:58Z

Okay, just had some refactoring and mid into it i noted what we were talking about, the regex is actually executed as part of the search on not when creating the jsonpathinst. thus getting rid of the cache is a regression here.

Though anyway maybe I can convince you, rather than building a cache, evaluate the respective regexs on creation of the jsonpathinst.

Getting rid of all the boxes with this PR improves the default case on my machine from:
139.82 µs down to 86.049 µs
When keeping the data, even down to 56.392 µs.
Thats not quite yet the 24.638 µs

But on the other hand, every other operation, that is not a regex, gets a speeup from
23.448 µs to 512.16 ns (e.g. see equal benchmark).

When combining both we might even see sub us for the regex case

besok · 2024-05-17T18:11:36Z

The introduced cache handles a specific situation.
The compilation of regex is a voracious operation and takes quite some time.

Thus, if a user has either multithreading or a stream of queries with the same regex in the long-living app,
the cache will boost performance. It can be also serialized and spreaded further (I doubt it needs to anybody but the opportunity exists )

Though anyway maybe I can convince you, rather than building a cache, evaluate the respective regexs on creation of the jsonpathinst.

This is an unconnected process or at least I don't see a bond.

Still, I would not say the cache and removing boxes and other auxiliary structures is a mutually exclusive process.
And grooming the API and removing deprecated (so to say) structures also would be good to merge into.
Therefore, I would not remove the cache but instead introduce a new method like find_slice_with_cfg and so on.

xMAC94x · 2024-05-17T19:47:15Z

Here is some ugly code that statically compiles the regex, when the JsonPathInst is generated: e527958

compilation time goes up by:
JsonPathInst generation time: [76.777 µs 76.844 µs 76.915 µs]
change: [+2.6997% +3.0563% +3.3790%] (p = 0.00 < 0.05)
Performance has regressed.

while now execution time improves by:
regex bench with reuse time: [591.67 ns 594.80 ns 598.18 ns]
change: [-98.975% -98.969% -98.964%] (p = 0.00 < 0.05)
Performance has improved.

this is an other 50-times fold improvement.

The only thing i have: It currently only works with static regex, idk if dynamic regex is a thing.
Do you know if dynamic regex is a thing. I also dont know if there is an less-ugly way to encode it rather than in the FilterSign.

besok · 2024-05-17T20:09:30Z

But that is precisely the case.
We parse a string that represents regex.
The compilation of this string into the particular regular expression is what the cache is summoned to solve.

In your example, you precompile it beforehand outside of the jspath.

xMAC94x · 2024-06-03T07:45:16Z

sorry, non-native speaker here. Is this now a good thing that its compiled with the jspath or a bad thing ?

besok · 2024-06-03T08:02:22Z

It is not bad or good. It does not solve the initial problem.
The problem is the following:
Given - we have 1K requests for jspath like $.[?(@.field == '[a-zA-Z]')].
Target - compile it only once and then use already compiled regex.

You precompile a regular expression outside of the library. But it should be done inside the library because we parse a string into regex inside the library. So, you need something inside the library that will keep track of the regexes that are already compiled.

xMAC94x · 2024-06-03T12:30:06Z

in that example:

let path = JsonPathInst::from_str('$.[?(@.field == '[a-zA-Z]')]').unwrap();
for _ in 0..1000 {
  // actually 1000 different jsons:
  let json = json!({
        "field":"abcXYZ",
  });
  let x = jsonpath_rust::find(&path, &json);
}

we can solve the problem with regex compilation in 2 ways:

add a cache that espeically only caches regexes and stores them.
just compile the regex when the path is created, that way we can leverage both effects. the regex is only compiled ones, but the rest is only compiled once. Even higher speed increase. no need for an internal hashmap with expensive hashmap lookups.

Or do I get it wrong somehow ?

besok · 2024-06-03T14:16:00Z

I think in general you are correct.

add a cache that espeically only caches regexes and stores them.

Yeah, that is how it is implemented now.

just compile the regex when the path is created, that way we can leverage both effects. the regex is only compiled ones, but the rest is only compiled once. Even higher speed increase. no need for an internal hashmap with expensive hashmap lookups.

Maybe, I missed something but a couple of points.
You want the users to implement a so to say, user cache with jsonpathinst instead of regexes, don't you?
It will work if you have the same queries but in general, a user can have something like that:

$.[?(@.field == '[a-zA-Z]')]
$.field1[?(@.field == '[a-zA-Z]')]
$.[?(@.field == '[a-zA-Z]')].field2.length
$.[?(@ == '[a-zA-Z]')]

and so on. The queries will be different and the regex will be calculated again.

xMAC94x · 2024-06-04T08:11:19Z

Good point, I haven't though about that yet.

I don't necessarily want the user to build a cache in the sense of a HashMap.
But I want the user be able to reuse jsonpathinst for multiple json data easily. That has a similary effect on speed (and technically could also be called a cache) because it doesn't need to be reevaluated on each .find()

Mentioning different Json-Paths but with the same regex inside is truely something we should consider.
Such a scenario would definitely benefit from a cache. The question now is how common is that for users of this crate. Because while a cache helps if you have many matches, it also adds costs for all other queries that are not different-json-path-containing-same-regex.

It now gets a bit tricky, because we have to think how a user of this crates uses it and find a good tradeoff.

IMO I would prob see a user have some different-json-path-containing-same-regex like paths, but I don't see them scaling. what I want to say is, maybe they have 10 of this queries, but it would rather be uncommon that they have thousands.

I would probably expect most use cases are either: The application has a fixed number of paths and many scaling data, or the application has a fixed number of data and is checking many paths. In both cases, beeing able to reuse via reference would provide a huge benefit (see above -98,9%, that 100x faster).

That would allow, even if there are hundreds of jsonpath where the same regex is used, we would still be faster in the end.

The only exception is probably a application where every jsonpath, without exception is different from each other but share the same regex

besok · 2024-06-04T08:31:36Z

Very good point. I don't know why but the idea of caching the whole jspath slipped through my fingers. I will think about it in the evening a bit, searching for pitfalls.

besok · 2024-06-04T21:43:52Z

I have taken a look to the benches and truly it seems the benefit of having a regex cache is leveled by the alternative of caching jspathinst itself.

Thanks for the great PR!
The solution is definitely better and cleaner than the one with the regex cache.

xMAC94x mentioned this pull request Dec 11, 2023

Revise a need of JsonPathFinder and groom the API in general. #59

Open

besok approved these changes Dec 22, 2023

View reviewed changes

xMAC94x force-pushed the xMAC94x/functions branch from ff66e94 to 7cd697c Compare May 17, 2024 10:27

xMAC94x added 2 commits May 17, 2024 18:30

Revert "61 regex perf (besok#62)"

8254333

This reverts commit a07c7b6.

try to get rid of JsonPathFinder and Boxes

994a3ec

- JsonPathFinder interface does not really benefit from storing the json or path internally - trying to get rid of the Box<> that is used inside of JsonPathFinder

xMAC94x force-pushed the xMAC94x/functions branch from 7cd697c to beabd2d Compare May 17, 2024 17:00

xMAC94x added 2 commits May 17, 2024 19:05

some clippy fixes

46d145b

xMAC94x force-pushed the xMAC94x/functions branch from beabd2d to e85553c Compare May 17, 2024 17:37

besok merged commit 1d375e6 into besok:main Jun 4, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get rid of JsonPathFinder and Boxes #58

get rid of JsonPathFinder and Boxes #58

xMAC94x commented Dec 5, 2023

besok commented Dec 6, 2023

xMAC94x commented Dec 6, 2023 •

edited

Loading

besok commented Dec 7, 2023

besok commented Dec 7, 2023

besok left a comment

besok commented Dec 23, 2023

Norlock commented May 15, 2024

besok commented May 15, 2024 •

edited

Loading

xMAC94x commented May 17, 2024 •

edited

Loading

besok commented May 17, 2024

xMAC94x commented May 17, 2024 •

edited

Loading

xMAC94x commented May 17, 2024

besok commented May 17, 2024 •

edited

Loading

xMAC94x commented May 17, 2024

besok commented May 17, 2024

xMAC94x commented Jun 3, 2024

besok commented Jun 3, 2024

xMAC94x commented Jun 3, 2024 •

edited

Loading

besok commented Jun 3, 2024

xMAC94x commented Jun 4, 2024 •

edited

Loading

besok commented Jun 4, 2024

besok commented Jun 4, 2024 •

edited

Loading

get rid of JsonPathFinder and Boxes #58

get rid of JsonPathFinder and Boxes #58

Conversation

xMAC94x commented Dec 5, 2023

besok commented Dec 6, 2023

xMAC94x commented Dec 6, 2023 • edited Loading

besok commented Dec 7, 2023

besok commented Dec 7, 2023

besok left a comment

Choose a reason for hiding this comment

besok commented Dec 23, 2023

Norlock commented May 15, 2024

besok commented May 15, 2024 • edited Loading

xMAC94x commented May 17, 2024 • edited Loading

besok commented May 17, 2024

xMAC94x commented May 17, 2024 • edited Loading

xMAC94x commented May 17, 2024

besok commented May 17, 2024 • edited Loading

xMAC94x commented May 17, 2024

besok commented May 17, 2024

xMAC94x commented Jun 3, 2024

besok commented Jun 3, 2024

xMAC94x commented Jun 3, 2024 • edited Loading

besok commented Jun 3, 2024

xMAC94x commented Jun 4, 2024 • edited Loading

besok commented Jun 4, 2024

besok commented Jun 4, 2024 • edited Loading

xMAC94x commented Dec 6, 2023 •

edited

Loading

besok commented May 15, 2024 •

edited

Loading

xMAC94x commented May 17, 2024 •

edited

Loading

xMAC94x commented May 17, 2024 •

edited

Loading

besok commented May 17, 2024 •

edited

Loading

xMAC94x commented Jun 3, 2024 •

edited

Loading

xMAC94x commented Jun 4, 2024 •

edited

Loading

besok commented Jun 4, 2024 •

edited

Loading