Skip to content

Commit

Permalink
Improve performance of _convert_arrow_to_proto (#1984)
Browse files Browse the repository at this point in the history
_convert_arrow_to_proto does multiple lookups in the list of column
names for the arrow table to find the index of columns. This is done for
each row so the time complexity grows quickly.
Rather than search the list of names for the index we construct a
dictionary {column_name: index}. This allows faster lookups for the
index of a column and speeds up the method significantly which affects
e.g. materialization.

Signed-off-by: Gunnar Sv Sigurbjörnsson <[email protected]>
  • Loading branch information
nossrannug authored Oct 28, 2021
1 parent d09b313 commit 66d6b78
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions sdk/python/feast/infra/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,27 +304,28 @@ def _coerce_datetime(ts):
else:
return ts

column_names_idx = {k: i for i, k in enumerate(table.column_names)}
for row in zip(*table.to_pydict().values()):
entity_key = EntityKeyProto()
for join_key in join_keys:
entity_key.join_keys.append(join_key)
idx = table.column_names.index(join_key)
idx = column_names_idx[join_key]
value = python_value_to_proto_value(row[idx])
entity_key.entity_values.append(value)
feature_dict = {}
for feature in feature_view.features:
idx = table.column_names.index(feature.name)
idx = column_names_idx[feature.name]
value = python_value_to_proto_value(row[idx], feature.dtype)
feature_dict[feature.name] = value
event_timestamp_idx = table.column_names.index(
event_timestamp_idx = column_names_idx[
feature_view.batch_source.event_timestamp_column
)
]
event_timestamp = _coerce_datetime(row[event_timestamp_idx])

if feature_view.batch_source.created_timestamp_column:
created_timestamp_idx = table.column_names.index(
created_timestamp_idx = column_names_idx[
feature_view.batch_source.created_timestamp_column
)
]
created_timestamp = _coerce_datetime(row[created_timestamp_idx])
else:
created_timestamp = None
Expand Down

0 comments on commit 66d6b78

Please sign in to comment.