-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend context for usage statistics collection & add latencies for performance analysis #1983
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1983 +/- ##
==========================================
- Coverage 82.21% 80.52% -1.69%
==========================================
Files 100 103 +3
Lines 8052 9161 +1109
==========================================
+ Hits 6620 7377 +757
- Misses 1432 1784 +352
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@pyalex: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
…rformance analysis Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Benchmarks showed that usage decorators & context managers don't add significant overhead: Master (FEAST_USAGE=False) vs Refactored usage decorators (FEAST_USAGE=True)
|
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
f"SELECT entity_key, feature_name, value, event_ts " | ||
f"FROM {_table_id(config.project, table)} " | ||
f"WHERE entity_key IN ({','.join('?' * len(entity_keys))}) " | ||
f"ORDER BY entity_key", | ||
[serialize_entity_key(entity_key) for entity_key in entity_keys], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this SQL query got changed. Any reasons why it's part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to avoid using tracing_span
context manager in the loop. So I refactored this function to have one call to sqlite instead. As a side effect this sped up sqlite retrieval in total by 10-15%. And this change seemed pretty small to create separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concerns with query itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, no concerns with the query
sdk/python/feast/infra/provider.py
Outdated
if config.provider == "gcp": | ||
from feast.infra.gcp import GcpProvider | ||
|
||
return PassthroughProvider(config) | ||
return GcpProvider(config) | ||
|
||
from feast.infra.local import LocalProvider | ||
|
||
return LocalProvider(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we make this logic similar to how we deal with other plugins (online stores, offline stores, etc). In other words create a mapping of provider name to the class path (
feast/sdk/python/feast/repo_config.py
Line 29 in 41affbb
ONLINE_STORE_CLASS_FOR_TYPE = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
try: | ||
yield | ||
finally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we record exception in except
block here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception is handled in @log_usage decorator. And tracing_span
context manager must be called only after this decorator applied, other words UsageContext must be already available.
Signed-off-by: pyalex <[email protected]>
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pyalex, tsotnet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: pyalex [email protected]
What this PR does / why we need it:
This PR extends existing usage collection system with time profiling. This will inform how we make step improvements in latency reduction, which can be a blocker for customer adoption.
In addition I also refactored existing usage collection system to group all events produced for a single call stack into one event that will share useful context from all nested functions making analysis much easier. For example
FeatureStore.get_online_features
(entrypoint function) starts with enriching context with some request specific details (whether ODFV was used in request), then it callsProvider.online_read
which adds information about type of provider and in turn callsStore.online_read
, which adds to context type of store and detailed traces of each remote call to the Database. All this data will be gathered in single event and will help to answer more comprehensive question like performance of specific store implementation or popularity of this store among all installations.Usage event will have next attributes:
Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: