Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert python values into proto values in bulk #2172

Merged
merged 5 commits into from
Dec 30, 2021

Conversation

pyalex
Copy link
Collaborator

@pyalex pyalex commented Dec 28, 2021

Signed-off-by: pyalex [email protected]

What this PR does / why we need it:

Packing python native values (or NumPy arrays) into proto objects (generated proto classes) takes roughly 80% of the time during materialization (not counting offline / online storage part). Ironically protobuf serialization itself takes around 10% of the time.

This PR refactors python_values_to_proto_values function to process data in bulk (essentially column by column), which improves performance by running all type checks only once (per batch / column). According to my analysis, this move will decrease packing time roughly by 50% without introducing significant changes (see first comment).

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

none

Signed-off-by: pyalex <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Dec 28, 2021

Codecov Report

Merging #2172 (a65682c) into master (c9ff695) will increase coverage by 0.00%.
The diff coverage is 85.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2172   +/-   ##
=======================================
  Coverage   84.58%   84.58%           
=======================================
  Files         102      102           
  Lines        8186     8195    +9     
=======================================
+ Hits         6924     6932    +8     
- Misses       1262     1263    +1     
Flag Coverage Δ
integrationtests 74.27% <85.00%> (-0.27%) ⬇️
unittests 59.03% <52.50%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/type_map.py 72.65% <77.77%> (-0.52%) ⬇️
sdk/python/feast/feature_store.py 91.39% <100.00%> (+0.03%) ⬆️
sdk/python/feast/infra/provider.py 90.09% <100.00%> (+0.18%) ⬆️
sdk/python/feast/online_response.py 87.71% <100.00%> (ø)
.../integration/online_store/test_universal_online.py 98.13% <0.00%> (+0.46%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9ff695...a65682c. Read the comment docs.

Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
@pyalex pyalex marked this pull request as ready for review December 28, 2021 18:29
@pyalex
Copy link
Collaborator Author

pyalex commented Dec 28, 2021

My performance analysis was based on case proposed by @judahrand:
Feature view with 3 features of primitive type and 3 features of list type:

fv = FeatureView(
    name="fv",
    entities=["fake_entity"],
    features=[
        Feature(
            name=str(i),
            dtype=ValueType.DOUBLE,
        ) for i in range(3)
    ] + [
        Feature(
            name=str(i),
            dtype=ValueType.FLOAT_LIST,

        ) for i in range(5, 8)
    ],
    ttl=None,
    batch_source=FakeSource,
)

Dataset with 10 thousands rows was generated

ROWS = 10_000
data = {
    'id': np.random.random_integers(low=0, high=int(1e12), size=(ROWS,))
}
data.update(
    {
        str(i): np.random.random(size=(ROWS,)).astype('float64')
        for i in range(3)
    }
)
data.update(
    {
        str(i): [np.random.random(size=(10,)).astype('float32') for _ in range(ROWS)]
        for i in range(5, 8)
    }
)

feast.infra.provider._convert_arrow_to_proto was benchmark target ran on my local machine.

Results:
master: 570 ms (average)
this PR: 320 ms (average).

@judahrand
Copy link
Member

This looks great!

"""
# ToDo: make a better sample for type checks (more than one element)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind making an issue tracking this after the PR goes in?

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, pyalex

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 680a7af into feast-dev:master Dec 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants