Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Addresses ZeroDivisionError when materializing file source with same timestamps #2551

Merged
merged 11 commits into from
Apr 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/reference/feature-servers/go-feature-retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The Go Feature Retrieval component currently only supports Redis and Sqlite as o

## Installation

As long as you are running macOS or linux x86 with python version 3.7-3.10, the go component comes pre-compiled when you run install feast.
As long as you are running macOS or linux, on x86, with python version 3.7-3.10, the go component comes pre-compiled when you install feast.

For developers, if you want to build from source, run `make compile-go-lib` to build and compile the go server.

Expand Down
22 changes: 18 additions & 4 deletions sdk/python/feast/infra/offline_stores/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,11 +299,25 @@ def evaluate_offline_job():
if created_timestamp_column
else [event_timestamp_column]
)
# try-catch block is added to deal with this issue https://github.com/dask/dask/issues/8939.
# TODO(kevjumba): remove try catch when fix is merged upstream in Dask.
try:
if created_timestamp_column:
source_df = source_df.sort_values(by=created_timestamp_column,)

source_df = source_df.sort_values(by=event_timestamp_column)

except ZeroDivisionError:
# Use 1 partition to get around case where everything in timestamp column is the same so the partition algorithm doesn't
# try to divide by zero.
if created_timestamp_column:
source_df = source_df.sort_values(
by=created_timestamp_column, npartitions=1
)

if created_timestamp_column:
source_df = source_df.sort_values(by=created_timestamp_column)

source_df = source_df.sort_values(by=event_timestamp_column)
source_df = source_df.sort_values(
by=event_timestamp_column, npartitions=1
)

source_df = source_df[
(source_df[event_timestamp_column] >= start_date)
Expand Down