Skip to content

Commit

Permalink
Improve performance of _convert_arrow_to_proto
Browse files Browse the repository at this point in the history
_convert_arrow_to_proto does multiple lookups in the list of column
names for the arrow table to find the index of columns. This is done for
each row so the time complexity grows quickly.
Rather than search the list of names for the index we construct a
dictionary {column_name: index}. This allows faster lookups for the
index of a column and speeds up the method significantly which affects
e.g. materialization.

Signed-off-by: Gunnar Sv Sigurbjörnsson <[email protected]>
  • Loading branch information
nossrannug committed Oct 28, 2021
1 parent 9932600 commit 1165d87
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions sdk/python/feast/infra/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,27 +304,28 @@ def _coerce_datetime(ts):
else:
return ts

column_names_idx = {k: i for i, k in enumerate(table.column_names)}
for row in zip(*table.to_pydict().values()):
entity_key = EntityKeyProto()
for join_key in join_keys:
entity_key.join_keys.append(join_key)
idx = table.column_names.index(join_key)
idx = column_names_idx[join_key]
value = python_value_to_proto_value(row[idx])
entity_key.entity_values.append(value)
feature_dict = {}
for feature in feature_view.features:
idx = table.column_names.index(feature.name)
idx = column_names_idx[feature.name]
value = python_value_to_proto_value(row[idx], feature.dtype)
feature_dict[feature.name] = value
event_timestamp_idx = table.column_names.index(
event_timestamp_idx = column_names_idx[
feature_view.batch_source.event_timestamp_column
)
]
event_timestamp = _coerce_datetime(row[event_timestamp_idx])

if feature_view.batch_source.created_timestamp_column:
created_timestamp_idx = table.column_names.index(
created_timestamp_idx = column_names_idx[
feature_view.batch_source.created_timestamp_column
)
]
created_timestamp = _coerce_datetime(row[created_timestamp_idx])
else:
created_timestamp = None
Expand Down

0 comments on commit 1165d87

Please sign in to comment.