Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newest (latest?) value of a feature #77

Closed
budi opened this issue Jan 16, 2019 · 9 comments
Closed

Newest (latest?) value of a feature #77

budi opened this issue Jan 16, 2019 · 9 comments

Comments

@budi
Copy link
Contributor

budi commented Jan 16, 2019

In an effort to make one of my creation pipeline bearable for Feast's ingestion, I tried to merge feature rows into one row if they have the same entity key, entity id, granularity, and event timestamp (basically everything in FeatureRowKey), but then I realized that there are no notion of "newer feature row" in Feast.

Try 1: Merge by FeatureRowKey.
Suppose I want to merge Feature Row by creating a KV of FeatureRowKey and FeatureRow.. This will work only if there are no conflicting feature ids within a feature row. If there are, we can't compare them because we can't tell which one is the "newest".

Try 2: None granularity.
By messing around on the creation part, I can disregard Granularity entirely, group by entity key and entity name, and produce a FeatureRow every time a new feature arrives (or even windowed). This will eliminate the need of comparing for "newest" feature if there are conflicting feature ids, but also takes up more resources. Not to mention that it'll bloat result counts which defeats the above effort purpose in the first place.

Are there any workaround on this?

I think if we disregard Granularity and think of event timestamp as processing timestamp, we can leave the creation part with more room to the way it produces FeatureRow.

I myself would rather have my FeatureRow produced less frequent, fat, and full of values if my resources allowed it, rather than very frequent, lean, and full of nulls.

Related : #53 #17

@tims
Copy link
Contributor

tims commented Jan 18, 2019

@budi is seems like your question is more about your creation pipeline than Feast itself?

@budi I'm actually working on a PR that will merge feature rows, coalescing them. For batch it's basically combine on entity, entityKey, granularity. For streaming we need to keep global state. This is probably not desirable for very high throughput streaming jobs. So it should be possible to disable.

@budi
Copy link
Contributor Author

budi commented Jan 18, 2019

@tims yes, it's mainly an effort to merge features in a feature row during creation in a somewhat efficient way before sending them to feast.

@budi
Copy link
Contributor Author

budi commented Jan 18, 2019

ah, nevertheless, for streaming data to bq, this still gonna be a problem because when you create a table row and do a sort query by created timestamp to get the latest, there are no guarantee that the latest created timestamp has the latest value.

@tims
Copy link
Contributor

tims commented Jan 18, 2019

That's what the event timestamp is for isn't it?

@tims
Copy link
Contributor

tims commented Jan 18, 2019

Closing this as it's not a Feast bug or feature request, see related issues.

@tims tims closed this as completed Jan 18, 2019
@tims
Copy link
Contributor

tims commented Jan 18, 2019

also see #88

@budi
Copy link
Contributor Author

budi commented Jan 18, 2019

right, yes I just get it. So with this, is it correct to assume that the risk of jumbled order of the processed message is outside of feast domain?

@tims
Copy link
Contributor

tims commented Jan 18, 2019

No, feast will protect against that with #88, we originally intended to store them all and use BigTable and scans to fetch the recent ones. But we've come to see that low latency access was more important.

Just an aside though:
We could still implement that for BigTable if we wanted.. it can be store specific. But we need to refactor a little bit so that serving, core and ingestion access a common library. We're getting there

@budi
Copy link
Contributor Author

budi commented Jan 18, 2019

cool, got it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants