-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bigquery): add better timers around every API call #8626
Merged
hsheth2
merged 20 commits into
datahub-project:master
from
mayurinehate:bq_timers_api_refractor
Sep 15, 2023
Merged
Changes from 18 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
b84601c
feat(bigquery): add better timers around every API call
mayurinehate 95bbcbe
wip, timers not added for unused methods - remove these ?
mayurinehate 76ddc3f
refractor in lineage.py
mayurinehate e57f134
report composition vs inheritance
mayurinehate 38b18bb
more refractor and fixes
mayurinehate 8d049a7
Merge branch 'master' into bq_timers_api_refractor
mayurinehate 31a3be8
fix lint, tests
mayurinehate eaa72a3
revert rename of bigquery_schema.py to bigquery_schema_api.py
mayurinehate ac2ab3b
Merge remote-tracking branch 'datahub-oss/master' into bq_timers_api_…
mayurinehate 94077db
Merge branch 'master' into bq_timers_api_refractor
mayurinehate e500275
Merge remote-tracking branch 'datahub-oss/master' into bq_timers_api_…
mayurinehate 1b3d5b5
move stateful check inside lineage module
mayurinehate c77a9ab
Merge branch 'master' into bq_timers_api_refractor
mayurinehate 6a2a3d4
merge related changes
mayurinehate 32fedfc
Merge branch 'master' into bq_timers_api_refractor
mayurinehate 9dca7e5
Merge branch 'master' into bq_timers_api_refractor
mayurinehate 79f84ba
address review comments
mayurinehate 25f4f0b
Merge branch 'master' into bq_timers_api_refractor
mayurinehate 0ba4efc
fix tests
mayurinehate ea2c4a1
Merge branch 'master' into bq_timers_api_refractor
mayurinehate File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
241 changes: 59 additions & 182 deletions
241
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
139 changes: 139 additions & 0 deletions
139
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_audit_log_api.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
import logging | ||
from datetime import datetime | ||
from typing import Callable, Iterable, List, Optional | ||
|
||
from google.cloud import bigquery | ||
from google.cloud.logging_v2.client import Client as GCPLoggingClient | ||
from ratelimiter import RateLimiter | ||
|
||
from datahub.ingestion.source.bigquery_v2.bigquery_audit import ( | ||
AuditLogEntry, | ||
BigQueryAuditMetadata, | ||
) | ||
from datahub.ingestion.source.bigquery_v2.bigquery_report import ( | ||
BigQueryAuditLogApiPerfReport, | ||
) | ||
from datahub.ingestion.source.bigquery_v2.common import ( | ||
BQ_DATE_SHARD_FORMAT, | ||
BQ_DATETIME_FORMAT, | ||
) | ||
|
||
logger: logging.Logger = logging.getLogger(__name__) | ||
|
||
|
||
# Api interfaces are separated based on functionality they provide | ||
# rather than the underlying bigquery client that is used to | ||
# provide the functionality. | ||
class BigQueryAuditLogApi: | ||
def __init__( | ||
self, | ||
report: BigQueryAuditLogApiPerfReport, | ||
rate_limit: bool, | ||
requests_per_min: int, | ||
) -> None: | ||
self.report = report | ||
self.rate_limit = rate_limit | ||
self.requests_per_min = requests_per_min | ||
|
||
def get_exported_bigquery_audit_metadata( | ||
self, | ||
bigquery_client: bigquery.Client, | ||
bigquery_audit_metadata_query_template: Callable[ | ||
[ | ||
str, # dataset: str | ||
bool, # use_date_sharded_tables: bool | ||
Optional[int], # limit: Optional[int] = None | ||
], | ||
str, | ||
], | ||
bigquery_audit_metadata_datasets: Optional[List[str]], | ||
use_date_sharded_audit_log_tables: bool, | ||
start_time: datetime, | ||
end_time: datetime, | ||
limit: Optional[int] = None, | ||
) -> Iterable[BigQueryAuditMetadata]: | ||
if bigquery_audit_metadata_datasets is None: | ||
return | ||
|
||
audit_start_time = start_time.strftime(BQ_DATETIME_FORMAT) | ||
audit_start_date = start_time.strftime(BQ_DATE_SHARD_FORMAT) | ||
|
||
audit_end_time = end_time.strftime(BQ_DATETIME_FORMAT) | ||
audit_end_date = end_time.strftime(BQ_DATE_SHARD_FORMAT) | ||
|
||
rate_limiter: Optional[RateLimiter] = None | ||
if self.rate_limit: | ||
rate_limiter = RateLimiter(max_calls=self.requests_per_min, period=60) | ||
|
||
with self.report.get_exported_log_entries as current_timer: | ||
for dataset in bigquery_audit_metadata_datasets: | ||
logger.info( | ||
f"Start loading log entries from BigQueryAuditMetadata in {dataset}" | ||
) | ||
|
||
query = bigquery_audit_metadata_query_template( | ||
dataset, | ||
use_date_sharded_audit_log_tables, | ||
limit, | ||
).format( | ||
start_time=audit_start_time, | ||
end_time=audit_end_time, | ||
start_date=audit_start_date, | ||
end_date=audit_end_date, | ||
) | ||
|
||
query_job = bigquery_client.query(query) | ||
logger.info( | ||
f"Finished loading log entries from BigQueryAuditMetadata in {dataset}" | ||
) | ||
|
||
for entry in query_job: | ||
with current_timer.pause(): | ||
if rate_limiter: | ||
with rate_limiter: | ||
yield entry | ||
else: | ||
yield entry | ||
|
||
def get_bigquery_log_entries_via_gcp_logging( | ||
self, | ||
client: GCPLoggingClient, | ||
filter: str, | ||
log_page_size: int, | ||
limit: Optional[int] = None, | ||
) -> Iterable[AuditLogEntry]: | ||
logger.debug(filter) | ||
|
||
list_entries: Iterable[AuditLogEntry] | ||
rate_limiter: Optional[RateLimiter] = None | ||
if self.rate_limit: | ||
# client.list_entries is a generator, does api calls to GCP Logging when it runs out of entries and needs to fetch more from GCP Logging | ||
# to properly ratelimit we multiply the page size by the number of requests per minute | ||
rate_limiter = RateLimiter( | ||
max_calls=self.requests_per_min * log_page_size, | ||
period=60, | ||
) | ||
|
||
with self.report.list_log_entries as current_timer: | ||
list_entries = client.list_entries( | ||
filter_=filter, | ||
page_size=log_page_size, | ||
max_results=limit, | ||
) | ||
|
||
for i, entry in enumerate(list_entries): | ||
if i % 1000 == 0: | ||
logger.info( | ||
f"Loaded {i} log entries from GCP Log for {client.project}" | ||
) | ||
|
||
with current_timer.pause(): | ||
if rate_limiter: | ||
with rate_limiter: | ||
yield entry | ||
else: | ||
yield entry | ||
|
||
logger.info( | ||
f"Finished loading log entries from GCP Log for {client.project}" | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine here, but in general would like to avoid passing complex functions because it's kinda ugly and hard to extend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree