Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databroker performance improvements in Get service call #26

Conversation

rafaeling
Copy link
Contributor

@rafaeling rafaeling commented May 6, 2024

This change means a 8 times faster improvement for the Get method, which drops to 7.8 ms for 5000 requests compared to the main branch implementation, which is 60 ms.

See the wildcard exceptions that have been made obsolete by this change:
https://github.com/eclipse-kuksa/kuksa-databroker/blob/9ec27697af5d7594245e5c7cd5ebc2eb687abb80/doc/wildcard_matching.md

@codecov-commenter
Copy link

codecov-commenter commented May 6, 2024

Codecov Report

Attention: Patch coverage is 75.19026% with 163 lines in your changes missing coverage. Please review.

Project coverage is 50.92%. Comparing base (963a297) to head (29a968b).

Files Patch % Lines
databroker/src/grpc/kuksa_val_v1/val.rs 71.42% 82 Missing ⚠️
databroker/src/glob.rs 78.04% 81 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
+ Coverage   49.40%   50.92%   +1.52%     
==========================================
  Files          31       31              
  Lines       11285    11878     +593     
==========================================
+ Hits         5575     6049     +474     
- Misses       5710     5829     +119     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SebastianSchildt
Copy link
Contributor

SebastianSchildt commented May 8, 2024

This may be faster, but it breaks a lot in the existing API as well.

before


Test Client> getValue Vehicle.CurrentLocation 
[
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.MountingPosition.X"
    },
    {
        "path": "Vehicle.CurrentLocation.Latitude"
    },
    {
        "path": "Vehicle.CurrentLocation.VerticalAccuracy"
    },
    {
        "path": "Vehicle.CurrentLocation.HorizontalAccuracy"
    },
    {
        "path": "Vehicle.CurrentLocation.Heading"
    },
    {
        "path": "Vehicle.CurrentLocation.Timestamp"
    },
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.MountingPosition.Z"
    },
    {
        "path": "Vehicle.CurrentLocation.Longitude"
    },
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.FixType"
    },
    {
        "path": "Vehicle.CurrentLocation.Altitude"
    },
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.MountingPosition.Y"
    }
]

Test Client>       

After

Test Client> getValue Vehi
EXCEPTION of type 'Exception' occurred with message: 'Wrong databroker version, please use a newer version'
To enable full traceback, run the following command: 'set debug true'
Test Client> getValue Vehi
EXCEPTION of type 'Exception' occurred with message: 'Wrong databroker version, please use a newer version'
To enable full traceback, run the following command: 'set debug true'
Test Client> getValue Vehicle.CurrentLocation
{
    "error": {
        "code": 404,
        "reason": "not_found",
        "message": "Path not found"
    },
    "errors": [
        {
            "path": "Vehicle.CurrentLocation",
            "error": {
                "code": 404,
                "reason": "not_found",
                "message": "Path not found"
            }
        }
    ]
}

Test Client> 

(I guess breaking AutoComplete is another side effect of this)

So that would mean a major version bump and losing features

If RegExp (I guess the extending/matching with stars) is the performance bottle neck here, would it not be better, to only use RegExp if needed?

As in

if "*" in path do regexp path
else 
    doSameAsHere() and
          if leaf: return stuff
          if branch "regexp it"
         else "404"

That would just cost a bit of ifs, and I guess the heavy part of the regexps is, that you can't precompile (becasue they come form client). So implementing like above: A client that needs performance ONLY subscribes to Lead nodes and is fine,. Everybody else can knick themselves out

@rafaeling
Copy link
Contributor Author

rafaeling commented May 8, 2024

This was a hot fix to see if we could get some performance improvements.
We are still considering doing more research to keep the wildcard combined with good performance without these PR changes.
If the research results are not so promising, there is an additional idea of a new algorithm to partially support the wildcard only for the .* and .** cases while keeping this PR O(1) algorithm.
I will change it to Draft as it should not be merged yet.

One more thing regarding auto-completion, yes, it is good to have it and it has been considered to implement an extra service method for Metadata that will support wildcards, which means that the user would not need wildcard support for Get as all signals could be stored locally by the client once in advance.

@rafaeling rafaeling marked this pull request as draft May 8, 2024 13:51
@SebastianSchildt
Copy link
Contributor

Yes, in terms of an updated/new API I am sure there may be better ways to split/do this

However, for the current API, we should not merge something into main, that changes behavior significantly (I mean this doesn"t reinterpret some unclarities, it would actually remove features contradict what is documented)

"Engineering" wise,I am not sure, but have you looked into whether "globbing" libraries are faster? Like i.e. https://github.com/devongovett/glob-match , I however lack the background to know if this is "fundamentally" faster then the regexp construction of state amchie and machting. And it also wil be a bandaid, because non-wildcard hashmap lookup shall be O(1) (with respect of number of signals, the hashing obviously would be O(l) with l being length of the path), and I doubt the most clever glob matching when backed by a "normal" map/list can ever be faster than O(n)..... But surprise me, if you can deliver O(1) we will probably take it :D

so tldr: I still think now and also in a future APIs ome kind of matching useful, but should be implemented in way that it is only "used when needed"

Off topic here, but since you mentioned it: I guess even when "discoverability" is in a metadata API a user might still want to subscribe to "a branch" (which can be seen as a "wildcard" sort of), otherwise something like the recorder from https://github.com/eclipse-kuksa/kuksa-csv-provider gets really hard, consider you want to generate a trace of everzthing running throug databroker (or a subbranch) but are not able to say "subscribe vehicle", but instead need to list/do 1000 individual paths

@rafaeling rafaeling force-pushed the databroker_performance_improvements branch from 820bb38 to 010fa74 Compare May 13, 2024 08:44
@rafaeling rafaeling marked this pull request as ready for review May 13, 2024 08:45
@rafaeling
Copy link
Contributor Author

Trying out the library you posted, will test it and check the performance results.

@rafaeling rafaeling force-pushed the databroker_performance_improvements branch from 43fe4e4 to bd14dcd Compare May 18, 2024 16:13
@rafaeling
Copy link
Contributor Author

Wildcard now uses glob-matching library https://github.com/devongovett/glob-match
Can be reviewed

@rafaeling rafaeling force-pushed the databroker_performance_improvements branch 3 times, most recently from 635f6a6 to f129492 Compare May 21, 2024 15:49
@SebastianSchildt SebastianSchildt requested a review from argerus May 24, 2024 06:55
@rafaeling rafaeling force-pushed the databroker_performance_improvements branch from f99747b to 6dff684 Compare May 29, 2024 07:23
databroker/src/broker.rs Show resolved Hide resolved
databroker/src/grpc/kuksa_val_v1/val.rs Outdated Show resolved Hide resolved
databroker/src/grpc/kuksa_val_v1/val.rs Outdated Show resolved Hide resolved
@SebastianSchildt
Copy link
Contributor

I think, that sutff like

**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this

Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?

Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

@rafaeling
Copy link
Contributor Author

rafaeling commented Jun 5, 2024

I think, that sutff like

**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this

Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?

Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

Vehicle.Cabin.Sunroof is not supported anymore, instead Vehicle.Cabin.Sunroof.** or Vehicle.Cabin.Sunroof.* should be use.
As far as I understand, there is no way in Databroker to determine if a path is a branch or not; Databroker can only internally query whether a path is contained in the database or not.
Previously, it was supported because Regex does not have as many limitations as the glob matching library approach.

databroker/src/broker.rs Show resolved Hide resolved
databroker/src/glob.rs Outdated Show resolved Hide resolved
databroker/src/glob.rs Outdated Show resolved Hide resolved
databroker/src/glob.rs Outdated Show resolved Hide resolved
databroker/src/glob.rs Show resolved Hide resolved
databroker/src/grpc/kuksa_val_v1/val.rs Outdated Show resolved Hide resolved
@SebastianSchildt
Copy link
Contributor

I think, that sutff like
**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this
Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?
Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

Vehicle.Cabin.Sunroof is not supported anymore, instead Vehicle.Cabin.Sunroof.** or Vehicle.Cabin.Sunroof.* should be use. As far as I understand, there is no way in Databroker to determine if a path is a branch or not; Databroker can only internally query whether a path is contained in the database or not. Previously, it was supported because Regex does not have as many limitations as the glob matching library approach.

Ah, I remember, because "branches" are not really a thing in our internal data structure. In terms of keeping API compatibility, would something be feasible like

inputstr=`Vehicle.Cabin.Sunroof`

res = glob_the_shit_out_of_this(inputStr)

if res=no match
   res = glob_the_shit_out_of_this(inputStr+".**")

It should not really cost anything , when querying direct leaves.

I understand, for any kind of new API we would always use the "best" approch, but this is slightly modifying existing API behavior, so I would prefer to keep behavior changes to a minimum

@rafaeling
Copy link
Contributor Author

I think, that sutff like
**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this
Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?
Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

Vehicle.Cabin.Sunroof is not supported anymore, instead Vehicle.Cabin.Sunroof.** or Vehicle.Cabin.Sunroof.* should be use. As far as I understand, there is no way in Databroker to determine if a path is a branch or not; Databroker can only internally query whether a path is contained in the database or not. Previously, it was supported because Regex does not have as many limitations as the glob matching library approach.

Ah, I remember, because "branches" are not really a thing in our internal data structure. In terms of keeping API compatibility, would something be feasible like

inputstr=`Vehicle.Cabin.Sunroof`

res = glob_the_shit_out_of_this(inputStr)

if res=no match
   res = glob_the_shit_out_of_this(inputStr+".**")

It should not really cost anything , when querying direct leaves.

I understand, for any kind of new API we would always use the "best" approch, but this is slightly modifying existing API behavior, so I would prefer to keep behavior changes to a minimum

Implemented your suggestions. Now it is also compatible for branches use cases like Vehicle.Cabin.Sunroof
Please test it, meanwhile I will try fix the format errors as soon as possible.

@SebastianSchildt
Copy link
Contributor

@rafaeling sounds good

It would be good, if the conflicts are resolved, because that seems to prevent CI runs, so currently can not lazily test by just pulling the docker image :D

@SebastianSchildt
Copy link
Contributor

Built manually and checked behavior. Looking good to me!

@rafaeling rafaeling force-pushed the databroker_performance_improvements branch 2 times, most recently from bb13826 to aea1aea Compare June 6, 2024 08:41
databroker/src/glob.rs Outdated Show resolved Hide resolved
databroker/src/glob.rs Outdated Show resolved Hide resolved
databroker/src/glob.rs Outdated Show resolved Hide resolved
databroker/src/glob.rs Outdated Show resolved Hide resolved
databroker/src/grpc/kuksa_val_v1/val.rs Outdated Show resolved Hide resolved
databroker/src/glob.rs Outdated Show resolved Hide resolved
databroker/src/glob.rs Outdated Show resolved Hide resolved
databroker/src/glob.rs Outdated Show resolved Hide resolved
@rafaeling rafaeling force-pushed the databroker_performance_improvements branch from bd47ae4 to b57e74f Compare June 7, 2024 13:25
@rafaeling rafaeling force-pushed the databroker_performance_improvements branch from b57e74f to 29a968b Compare June 7, 2024 13:28
Copy link
Contributor

@SebastianSchildt SebastianSchildt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🍭

@SebastianSchildt SebastianSchildt merged commit 1f1f1a1 into eclipse-kuksa:main Jun 8, 2024
23 checks passed
@rafaeling rafaeling mentioned this pull request Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants