-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newest (latest?) value of a feature #77
Comments
@budi is seems like your question is more about your creation pipeline than Feast itself? @budi I'm actually working on a PR that will merge feature rows, coalescing them. For batch it's basically combine on entity, entityKey, granularity. For streaming we need to keep global state. This is probably not desirable for very high throughput streaming jobs. So it should be possible to disable. |
@tims yes, it's mainly an effort to merge features in a feature row during creation in a somewhat efficient way before sending them to feast. |
ah, nevertheless, for streaming data to bq, this still gonna be a problem because when you create a table row and do a sort query by created timestamp to get the latest, there are no guarantee that the latest created timestamp has the latest value. |
That's what the event timestamp is for isn't it? |
Closing this as it's not a Feast bug or feature request, see related issues. |
also see #88 |
right, yes I just get it. So with this, is it correct to assume that the risk of jumbled order of the processed message is outside of feast domain? |
No, feast will protect against that with #88, we originally intended to store them all and use BigTable and scans to fetch the recent ones. But we've come to see that low latency access was more important. Just an aside though: |
cool, got it |
In an effort to make one of my creation pipeline bearable for Feast's ingestion, I tried to merge feature rows into one row if they have the same entity key, entity id, granularity, and event timestamp (basically everything in FeatureRowKey), but then I realized that there are no notion of "newer feature row" in Feast.
Try 1: Merge by FeatureRowKey.
Suppose I want to merge Feature Row by creating a KV of FeatureRowKey and FeatureRow.. This will work only if there are no conflicting feature ids within a feature row. If there are, we can't compare them because we can't tell which one is the "newest".
Try 2: None granularity.
By messing around on the creation part, I can disregard Granularity entirely, group by entity key and entity name, and produce a FeatureRow every time a new feature arrives (or even windowed). This will eliminate the need of comparing for "newest" feature if there are conflicting feature ids, but also takes up more resources. Not to mention that it'll bloat result counts which defeats the above effort purpose in the first place.
Are there any workaround on this?
I think if we disregard Granularity and think of event timestamp as processing timestamp, we can leave the creation part with more room to the way it produces FeatureRow.
I myself would rather have my FeatureRow produced less frequent, fat, and full of values if my resources allowed it, rather than very frequent, lean, and full of nulls.
Related : #53 #17
The text was updated successfully, but these errors were encountered: