-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase efficiency of Registry updates #1698
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1698 +/- ##
==========================================
+ Coverage 84.26% 84.45% +0.18%
==========================================
Files 78 79 +1
Lines 6910 7012 +102
==========================================
+ Hits 5823 5922 +99
- Misses 1087 1090 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: achals, felixwang9817 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@felixwang9817 linter needs to be fixed, and I had some questsions/nits |
83ed537
to
7fbeb8d
Compare
/retest |
I think it's important that this PR have explicit "documentation" on the changes that are being introduced to the registry. @felixwang9817 can you please write out the exact changes (and trade offs) that we are making here? What is the before/after? |
@woop I added some documentation to the PR, let me know if you want any further clarification. |
One part that isn't clear: If I am a reader only (like a model serving application that has no write access to the registry and doesn't plan to run |
80a1814
to
c252f47
Compare
@woop I revised my documentation, see above. tldr: there are no changes for a reader, only a writer. |
Signed-off-by: Felix Wang <[email protected]>
Signed-off-by: Felix Wang <[email protected]>
Signed-off-by: Felix Wang <[email protected]>
Signed-off-by: Felix Wang <[email protected]>
c252f47
to
7ca4725
Compare
/lgtm |
* Modify Registry to allow for delayed persistence of changes Signed-off-by: Felix Wang <[email protected]> * Increase efficiency of Registry updates Signed-off-by: Felix Wang <[email protected]> * Registry tests Signed-off-by: Felix Wang <[email protected]> * Change copyright date Signed-off-by: Felix Wang <[email protected]> Signed-off-by: CS <[email protected]>
Signed-off-by: Felix Wang [email protected]
What this PR does / why we need it: make Registry updates more efficient. Right now, Registry updates are done by looping through FeatureViews and Entities and applying them one at a time. Applying an update to the Registry forces the underlying RegistryStore to update the underlying RegistryProto and write it (either to disk, GCS, or S3, depending on the backend of choice). Thus, looped writes are highly inefficient.
Current design: the Registry current has a cache to improve reading speed. Writes always go through to the backend, but not through the cache. Thus, the cache is not always up-to-date with the ground truth in the backend. The user is granted a lot of optionality around the cache: they may choose not to use the cache, thus reading from backend directly, or they may force the cache to refresh.
After this PR: the cache will be write-back, meaning writes always go to the cache, and only go the backend when flushed. Then, looping updates to the Registry will be fast since all updates will be performed in-memory, and flushed once at the end. Note that this change only affects writing to the cache. Reading from the cache is unchanged: the
refresh
andcache_ttl
mechanisms are still available to readers.Advantages: looping updates become fast. Arguably, makes the boundary between the Registry and RegistryStore classes more clear (right now the RegistryStore takes in an updater function which means it is changing the proto itself, whereas in the new version the Registry would be changing the proto, which makes more sense since the RegistryStore should only be storing the proto, not changing it).
Disadvantages: changes the Registry interface (but not in a breaking fashion). More importantly, introduces more complexity around managing concurrent access to a Registry. Also, theoretically leaves a greater window for errors to occur (e.g. a bunch of operations could occur on the in-memory cache, then the process could crash without persisting those changes).
Alternative solution: expose a new function such as apply_all_changes(feature_views: List[FeatureView], entities: List[Entity]) that applies all changes in memory and then persists it. This solution does not require changing the current usage of the cache, but changes the Registry interface with a method that seems somewhat hacky.
Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: