-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode feature row before storing in Redis #530
Encode feature row before storing in Redis #530
Conversation
/test test-core-and-ingestion |
ingestion/src/main/java/feast/store/serving/redis/FeatureRowToRedisMutationDoFn.java
Show resolved
Hide resolved
private byte[] getValue(FeatureRow featureRow) { | ||
FeatureRowEncoder encoder = | ||
new FeatureRowEncoder(featureSets.get(featureRow.getFeatureSet()).getSpec()); | ||
return encoder.encode(featureRow).toByteArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this make more sense as a static method encode(featureRow, featureSet)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, a static method does make more sense. Unless we need to support multiple encoding mechanism, which is unlikely.
* @return boolean | ||
*/ | ||
public Boolean isEncodingValid(FeatureRow featureRow) { | ||
return featureRow.getFieldsList().size() == spec.getFeaturesList().size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you discover an old feature row (that is still within max age), how will you interpret it? Are we using the versions here to be able to interpret old feature rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redis Key contains both the feature set name and version. Since both ingestion and serving encode/decode based on the feature set name and version specified in Redis Key, the spec should be consistent, and therefore isEncodingValid will always return true.
Originally i added this method to partially address the concern raised in #515:
If field values are only going to be associated with field names at runtime by external configuration has any thought been given to a method for ensuring that the same configuration that was used to write the data is the configuration used to read the data? Something such as a checksum/fingerprint of the FeatureSet configuration stored alongside the data in Redis (or in the key) will help prevent a mismatch of configuration and data due to a bug somewhere else in the system.
Perhaps i should remove this method from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this thread and the PR as a whole especially the part quoted below, does this approach fall down and need to be reworked as versions get removed?
In cases where the Feature Row are from external Kafka source, where it is possible for the feature row to be malformed, there are a few checks in place:
- It's fine for the feature row to have feature fields in a different order than that specified in the feature set spec, as the encoding process will sort the feature in alphabetical order based on name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, @zhilingc has brought this up in a comment thread on her versions removal RFC. Ouch 🤕
/test test-end-to-end-batch |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: khorshuheng, zhilingc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Encode feature row before storing in Redis * Include encoding as part of RedisMutationDoFn Co-authored-by: Khor Shu Heng <[email protected]>
* Encode feature row before storing in Redis * Include encoding as part of RedisMutationDoFn Co-authored-by: Khor Shu Heng <[email protected]>
What this PR does / why we need it:
Currently, feature names are stored together with the values in Redis, which result in excessive memory foot print in cases where feature names are long strings.
This PR will encode the Feature Row prior to storing them in Redis. Encoding strategy:
In cases where the Feature Row are from external Kafka source, where it is possible for the feature row to be malformed, there are a few checks in place:
Good to have, but out of scope for this PR:
Which issue(s) this PR fixes:
Fixes #515.
Does this PR introduce a user-facing change?:
After this patch, the ingestion job will start storing the feature row in encoded format. However, the serving job is still able to read the non encoded feature row, hence it is not necessary to re-ingest the data prior to the application of the patch, unless the user wish to reduce the memory footprint of the existing keys.