Databroker performance improvements in Get service call #26

rafaeling · 2024-05-06T15:24:04Z

This change means a 8 times faster improvement for the Get method, which drops to 7.8 ms for 5000 requests compared to the main branch implementation, which is 60 ms.

See the wildcard exceptions that have been made obsolete by this change:
https://github.com/eclipse-kuksa/kuksa-databroker/blob/9ec27697af5d7594245e5c7cd5ebc2eb687abb80/doc/wildcard_matching.md

codecov-commenter · 2024-05-06T15:26:13Z

Codecov Report

Attention: Patch coverage is 75.19026% with 163 lines in your changes missing coverage. Please review.

Project coverage is 50.92%. Comparing base (963a297) to head (29a968b).

Files	Patch %	Lines
databroker/src/grpc/kuksa_val_v1/val.rs	71.42%	82 Missing ⚠️
databroker/src/glob.rs	78.04%	81 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
+ Coverage   49.40%   50.92%   +1.52%     
==========================================
  Files          31       31              
  Lines       11285    11878     +593     
==========================================
+ Hits         5575     6049     +474     
- Misses       5710     5829     +119

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

SebastianSchildt · 2024-05-08T12:20:36Z

This may be faster, but it breaks a lot in the existing API as well.

before


Test Client> getValue Vehicle.CurrentLocation 
[
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.MountingPosition.X"
    },
    {
        "path": "Vehicle.CurrentLocation.Latitude"
    },
    {
        "path": "Vehicle.CurrentLocation.VerticalAccuracy"
    },
    {
        "path": "Vehicle.CurrentLocation.HorizontalAccuracy"
    },
    {
        "path": "Vehicle.CurrentLocation.Heading"
    },
    {
        "path": "Vehicle.CurrentLocation.Timestamp"
    },
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.MountingPosition.Z"
    },
    {
        "path": "Vehicle.CurrentLocation.Longitude"
    },
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.FixType"
    },
    {
        "path": "Vehicle.CurrentLocation.Altitude"
    },
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.MountingPosition.Y"
    }
]

Test Client>

After

Test Client> getValue Vehi
EXCEPTION of type 'Exception' occurred with message: 'Wrong databroker version, please use a newer version'
To enable full traceback, run the following command: 'set debug true'
Test Client> getValue Vehi
EXCEPTION of type 'Exception' occurred with message: 'Wrong databroker version, please use a newer version'
To enable full traceback, run the following command: 'set debug true'
Test Client> getValue Vehicle.CurrentLocation
{
    "error": {
        "code": 404,
        "reason": "not_found",
        "message": "Path not found"
    },
    "errors": [
        {
            "path": "Vehicle.CurrentLocation",
            "error": {
                "code": 404,
                "reason": "not_found",
                "message": "Path not found"
            }
        }
    ]
}

Test Client>

(I guess breaking AutoComplete is another side effect of this)

So that would mean a major version bump and losing features

If RegExp (I guess the extending/matching with stars) is the performance bottle neck here, would it not be better, to only use RegExp if needed?

As in

if "*" in path do regexp path
else 
    doSameAsHere() and
          if leaf: return stuff
          if branch "regexp it"
         else "404"

That would just cost a bit of ifs, and I guess the heavy part of the regexps is, that you can't precompile (becasue they come form client). So implementing like above: A client that needs performance ONLY subscribes to Lead nodes and is fine,. Everybody else can knick themselves out

rafaeling · 2024-05-08T13:50:22Z

This was a hot fix to see if we could get some performance improvements.
We are still considering doing more research to keep the wildcard combined with good performance without these PR changes.
If the research results are not so promising, there is an additional idea of a new algorithm to partially support the wildcard only for the .* and .** cases while keeping this PR O(1) algorithm.
I will change it to Draft as it should not be merged yet.

One more thing regarding auto-completion, yes, it is good to have it and it has been considered to implement an extra service method for Metadata that will support wildcards, which means that the user would not need wildcard support for Get as all signals could be stored locally by the client once in advance.

SebastianSchildt · 2024-05-09T05:23:20Z

Yes, in terms of an updated/new API I am sure there may be better ways to split/do this

However, for the current API, we should not merge something into main, that changes behavior significantly (I mean this doesn"t reinterpret some unclarities, it would actually remove features contradict what is documented)

"Engineering" wise,I am not sure, but have you looked into whether "globbing" libraries are faster? Like i.e. https://github.com/devongovett/glob-match , I however lack the background to know if this is "fundamentally" faster then the regexp construction of state amchie and machting. And it also wil be a bandaid, because non-wildcard hashmap lookup shall be O(1) (with respect of number of signals, the hashing obviously would be O(l) with l being length of the path), and I doubt the most clever glob matching when backed by a "normal" map/list can ever be faster than O(n)..... But surprise me, if you can deliver O(1) we will probably take it :D

so tldr: I still think now and also in a future APIs ome kind of matching useful, but should be implemented in way that it is only "used when needed"

Off topic here, but since you mentioned it: I guess even when "discoverability" is in a metadata API a user might still want to subscribe to "a branch" (which can be seen as a "wildcard" sort of), otherwise something like the recorder from https://github.com/eclipse-kuksa/kuksa-csv-provider gets really hard, consider you want to generate a trace of everzthing running throug databroker (or a subbranch) but are not able to say "subscribe vehicle", but instead need to list/do 1000 individual paths

rafaeling · 2024-05-13T08:46:33Z

Trying out the library you posted, will test it and check the performance results.

rafaeling · 2024-05-21T08:39:14Z

Wildcard now uses glob-matching library https://github.com/devongovett/glob-match
Can be reviewed

databroker/src/broker.rs

databroker/src/grpc/kuksa_val_v1/val.rs

SebastianSchildt · 2024-06-05T03:33:33Z

I think, that sutff like

**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this

Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?

Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

rafaeling · 2024-06-05T11:16:42Z

I think, that sutff like

**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this

Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?

Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

Vehicle.Cabin.Sunroof is not supported anymore, instead Vehicle.Cabin.Sunroof.** or Vehicle.Cabin.Sunroof.* should be use.
As far as I understand, there is no way in Databroker to determine if a path is a branch or not; Databroker can only internally query whether a path is contained in the database or not.
Previously, it was supported because Regex does not have as many limitations as the glob matching library approach.

databroker/src/broker.rs

databroker/src/glob.rs

databroker/src/grpc/kuksa_val_v1/val.rs

SebastianSchildt · 2024-06-05T14:01:00Z

I think, that sutff like
**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this
Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?
Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

Vehicle.Cabin.Sunroof is not supported anymore, instead Vehicle.Cabin.Sunroof.** or Vehicle.Cabin.Sunroof.* should be use. As far as I understand, there is no way in Databroker to determine if a path is a branch or not; Databroker can only internally query whether a path is contained in the database or not. Previously, it was supported because Regex does not have as many limitations as the glob matching library approach.

Ah, I remember, because "branches" are not really a thing in our internal data structure. In terms of keeping API compatibility, would something be feasible like

inputstr=`Vehicle.Cabin.Sunroof`

res = glob_the_shit_out_of_this(inputStr)

if res=no match
   res = glob_the_shit_out_of_this(inputStr+".**")

It should not really cost anything , when querying direct leaves.

I understand, for any kind of new API we would always use the "best" approch, but this is slightly modifying existing API behavior, so I would prefer to keep behavior changes to a minimum

rafaeling · 2024-06-05T16:10:03Z

I think, that sutff like
**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this
Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?
Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

Vehicle.Cabin.Sunroof is not supported anymore, instead Vehicle.Cabin.Sunroof.** or Vehicle.Cabin.Sunroof.* should be use. As far as I understand, there is no way in Databroker to determine if a path is a branch or not; Databroker can only internally query whether a path is contained in the database or not. Previously, it was supported because Regex does not have as many limitations as the glob matching library approach.

Ah, I remember, because "branches" are not really a thing in our internal data structure. In terms of keeping API compatibility, would something be feasible like
inputstr=`Vehicle.Cabin.Sunroof`

res = glob_the_shit_out_of_this(inputStr)

if res=no match
   res = glob_the_shit_out_of_this(inputStr+".**")
It should not really cost anything , when querying direct leaves.

I understand, for any kind of new API we would always use the "best" approch, but this is slightly modifying existing API behavior, so I would prefer to keep behavior changes to a minimum

Implemented your suggestions. Now it is also compatible for branches use cases like Vehicle.Cabin.Sunroof
Please test it, meanwhile I will try fix the format errors as soon as possible.

SebastianSchildt · 2024-06-06T01:57:53Z

@rafaeling sounds good

It would be good, if the conflicts are resolved, because that seems to prevent CI runs, so currently can not lazily test by just pulling the docker image :D

SebastianSchildt · 2024-06-06T03:06:46Z

Built manually and checked behavior. Looking good to me!

databroker/src/glob.rs

databroker/src/grpc/kuksa_val_v1/val.rs

databroker/src/glob.rs

…b for wildcard handling

SebastianSchildt

lgtm 🍭

rafaeling marked this pull request as draft May 8, 2024 13:51

rafaeling force-pushed the databroker_performance_improvements branch from 820bb38 to 010fa74 Compare May 13, 2024 08:44

rafaeling marked this pull request as ready for review May 13, 2024 08:45

rafaeling force-pushed the databroker_performance_improvements branch from 43fe4e4 to bd14dcd Compare May 18, 2024 16:13

rafaeling force-pushed the databroker_performance_improvements branch 3 times, most recently from 635f6a6 to f129492 Compare May 21, 2024 15:49

SebastianSchildt requested a review from argerus May 24, 2024 06:55

rafaeling force-pushed the databroker_performance_improvements branch from f99747b to 6dff684 Compare May 29, 2024 07:23

lukasmittag reviewed Jun 4, 2024

View reviewed changes

databroker/src/broker.rs Show resolved Hide resolved

databroker/src/grpc/kuksa_val_v1/val.rs Outdated Show resolved Hide resolved

databroker/src/grpc/kuksa_val_v1/val.rs Outdated Show resolved Hide resolved

argerus requested changes Jun 5, 2024

View reviewed changes

argerus reviewed Jun 5, 2024

View reviewed changes

databroker/src/grpc/kuksa_val_v1/val.rs Outdated Show resolved Hide resolved

rafaeling force-pushed the databroker_performance_improvements branch 2 times, most recently from bb13826 to aea1aea Compare June 6, 2024 08:41

argerus requested changes Jun 6, 2024

View reviewed changes

argerus requested changes Jun 7, 2024

View reviewed changes

databroker/src/glob.rs Outdated Show resolved Hide resolved

databroker/src/glob.rs Outdated Show resolved Hide resolved

databroker/src/glob.rs Outdated Show resolved Hide resolved

argerus approved these changes Jun 7, 2024

View reviewed changes

rafaeling force-pushed the databroker_performance_improvements branch from bd47ae4 to b57e74f Compare June 7, 2024 13:25

Improve Get call performance by replacing regex with glob matching li…

29a968b

…b for wildcard handling

rafaeling force-pushed the databroker_performance_improvements branch from b57e74f to 29a968b Compare June 7, 2024 13:28

SebastianSchildt approved these changes Jun 8, 2024

View reviewed changes

SebastianSchildt merged commit 1f1f1a1 into eclipse-kuksa:main Jun 8, 2024
23 checks passed

rafaeling mentioned this pull request Jun 12, 2024

Prepare Release 0.4.6 #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databroker performance improvements in Get service call #26

Databroker performance improvements in Get service call #26

rafaeling commented May 6, 2024 •

edited

Loading

codecov-commenter commented May 6, 2024 •

edited by codecov bot

Loading

SebastianSchildt commented May 8, 2024 •

edited

Loading

rafaeling commented May 8, 2024 •

edited

Loading

SebastianSchildt commented May 9, 2024

rafaeling commented May 13, 2024

rafaeling commented May 21, 2024

SebastianSchildt commented Jun 5, 2024

rafaeling commented Jun 5, 2024 •

edited

Loading

SebastianSchildt commented Jun 5, 2024

rafaeling commented Jun 5, 2024

SebastianSchildt commented Jun 6, 2024

SebastianSchildt commented Jun 6, 2024

SebastianSchildt left a comment

Databroker performance improvements in Get service call #26

Databroker performance improvements in Get service call #26

Conversation

rafaeling commented May 6, 2024 • edited Loading

codecov-commenter commented May 6, 2024 • edited by codecov bot Loading

Codecov Report

SebastianSchildt commented May 8, 2024 • edited Loading

rafaeling commented May 8, 2024 • edited Loading

SebastianSchildt commented May 9, 2024

rafaeling commented May 13, 2024

rafaeling commented May 21, 2024

SebastianSchildt commented Jun 5, 2024

rafaeling commented Jun 5, 2024 • edited Loading

SebastianSchildt commented Jun 5, 2024

rafaeling commented Jun 5, 2024

SebastianSchildt commented Jun 6, 2024

SebastianSchildt commented Jun 6, 2024

SebastianSchildt left a comment

Choose a reason for hiding this comment

rafaeling commented May 6, 2024 •

edited

Loading

codecov-commenter commented May 6, 2024 •

edited by codecov bot

Loading

SebastianSchildt commented May 8, 2024 •

edited

Loading

rafaeling commented May 8, 2024 •

edited

Loading

rafaeling commented Jun 5, 2024 •

edited

Loading