-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update fix/optimize historical hourly measurements #3444
Update fix/optimize historical hourly measurements #3444
Conversation
…ient pandas execution and code clean up
…surements Updates from airqo staging
WalkthroughWalkthroughThe pull request introduces significant modifications to the Changes
Possibly related PRs
Suggested labels
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (6)
- src/workflows/airqo_etl_utils/airqo_utils.py (1 hunks)
- src/workflows/airqo_etl_utils/bigquery_api.py (6 hunks)
- src/workflows/airqo_etl_utils/config.py (1 hunks)
- src/workflows/airqo_etl_utils/data_validator.py (1 hunks)
- src/workflows/airqo_etl_utils/date.py (1 hunks)
- src/workflows/dags/airqo_measurements.py (1 hunks)
Files skipped from review due to trivial changes (1)
- src/workflows/airqo_etl_utils/date.py
Additional context used
Ruff
src/workflows/airqo_etl_utils/bigquery_api.py
168-168: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
Additional comments not posted (12)
src/workflows/airqo_etl_utils/data_validator.py (3)
85-85
: LGTM!The change to pass a list of column types to the
get_columns
method is consistent with the other calls in the function and does not affect the core logic.
88-88
: Looks good!Similar to the previous change, passing a list of column types to the
get_columns
method is consistent and does not affect the function's logic.
91-93
: Nicely done!The change to pass a list of column types to the
get_columns
method is consistent with the other calls in the function. Splitting the method call across multiple lines also improves readability without affecting the functionality.src/workflows/airqo_etl_utils/config.py (1)
221-244
: Excellent refactoring to improve code readability and maintainability!The change from using string literals to defined constants for the dictionary keys is a great enhancement. It makes the codebase more clear and less error-prone.
The functionality remains intact, as the mappings still point to the same JSON schema files. Overall, this is a positive change that improves the quality of the codebase without altering the behavior.
src/workflows/dags/airqo_measurements.py (1)
38-40
: The addition of thedynamic_query
parameter aligns with the PR objective of leveraging BigQuery for improved efficiency.Please ensure that the
dynamic_query
functionality within theextract_aggregated_raw_data
method is thoroughly tested and documented to avoid potential issues.To verify the impact of this change, consider running the following script:
Verification successful
The
dynamic_query
parameter is well-implemented and aligns with the PR objective.I've verified that the
dynamic_query
parameter is correctly implemented and documented in the codebase. Here are the key findings:
- The parameter is defined with a default value of
False
, ensuring backward compatibility.- It's well-documented: "A boolean value to signal bypassing the automatic query composition to a more dynamic averaging approach."
- The code uses conditional statements based on
dynamic_query
, indicating different behavior when enabled.While we couldn't locate the exact
extract_aggregated_raw_data
function orAirQoDataUtils
class, the usage ofdynamic_query
in the codebase suggests it's being utilized as intended.Great job on implementing this feature! It should indeed contribute to improved efficiency by leveraging BigQuery's capabilities.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage and impact of the `dynamic_query` parameter. # Test 1: Search for the usage of `dynamic_query` within `extract_aggregated_raw_data`. # Expect: Proper utilization of the parameter to enable flexible querying. ast-grep --lang python --pattern $'def extract_aggregated_raw_data($$$): $$$ if dynamic_query: $$$ $$$' # Test 2: Compare query performance with and without `dynamic_query`. # Expect: Improved performance with `dynamic_query` enabled. echo "Manually test query performance and document the results." # Test 3: Verify the documentation of the `dynamic_query` parameter. # Expect: Clear explanation of the parameter's purpose and usage. rg --type python $'dynamic_query'Length of output: 303
Script:
#!/bin/bash # Search for the extract_aggregated_raw_data function echo "Searching for extract_aggregated_raw_data function:" ast-grep --lang python --pattern 'def extract_aggregated_raw_data($$$): $$$' # Search for dynamic_query in Python files echo -e "\nSearching for dynamic_query in Python files:" fd -e py -x rg 'dynamic_query' {} # Search for AirQoDataUtils class echo -e "\nSearching for AirQoDataUtils class:" ast-grep --lang python --pattern 'class AirQoDataUtils($$$): $$$'Length of output: 849
src/workflows/airqo_etl_utils/bigquery_api.py (6)
23-23
: LGTM!Initializing the
schema_mapping
attribute from the configuration is a good practice for maintainability and flexibility.
139-139
: LGTM!The changes to accept a list of
ColumnDataType
enums and retrieve columns based on the list of column types are logically correct and allow for more versatile column type filtering.Also applies to: 145-145, 151-151
166-168
: LGTM!The changes to accept a list of column types and retrieve columns based on the list are logically correct and allow for more flexible column retrieval. The improved error handling for invalid tables is also a good addition.
Tools
Ruff
168-168: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
518-539
: LGTM!The
compose_query
method correctly composes a SQL query for BigQuery based on the query type and other parameters. The docstring provides a clear description of the method's parameters and return value.
617-621
: LGTM!The changes to the
query_data
method, including the newdynamic_query
parameter and the usage of thedynamic_averaging_query
method, allow for a more dynamic approach to querying data. Thedynamic_averaging_query
method constructs a SQL query to average numeric columns dynamically, which can be useful for certain use cases. The docstring provides a clear description of the method's parameters and return value.Also applies to: 623-638, 639-653, 660-662
665-737
: LGTM!The
dynamic_averaging_query
method constructs a dynamic SQL query to select and average numeric columns based on the provided parameters, which can be useful for certain use cases. The validation for the time granularity parameter ensures that only valid values are accepted. The docstring provides a clear description of the method's parameters, return value, and includes an example usage, which enhances the method's usability and maintainability.src/workflows/airqo_etl_utils/airqo_utils.py (1)
120-122
: Verify the usage of thedynamic_query
parameter.The addition of the
dynamic_query
parameter suggests support for different querying scenarios, which is a good enhancement. However, the parameter is not being utilized within the updated function body.Please ensure that the
dynamic_query
parameter is being used as intended. If it's not needed, consider removing it to avoid confusion.
self, table: str = "all", column_type: ColumnDataType = ColumnDataType.NONE | ||
self, | ||
table: str = "all", | ||
column_type: List[ColumnDataType] = [ColumnDataType.NONE], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use None
instead of a mutable data structure for the default argument.
The static analysis tool suggests using None
instead of a mutable data structure for the default argument and initializing it within the function.
Apply this diff to fix the issue:
- def get_columns(
- self,
- table: str = "all",
- column_type: List[ColumnDataType] = [ColumnDataType.NONE],
- ) -> List[str]:
+ def get_columns(
+ self,
+ table: str = "all",
+ column_type: List[ColumnDataType] = None,
+ ) -> List[str]:
+ column_type = column_type or [ColumnDataType.NONE]
Do you want me to open a GitHub issue to track this change?
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
column_type: List[ColumnDataType] = [ColumnDataType.NONE], | |
def get_columns( | |
self, | |
table: str = "all", | |
column_type: List[ColumnDataType] = None, | |
) -> List[str]: | |
column_type = column_type or [ColumnDataType.NONE] |
Tools
Ruff
168-168: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an implementation that requires a list just below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicholasTurner23, understood. Thanks for the clarification.
(_/)
(•.•)
(⊃🔍⊂)
Learnings added
Learnt from: NicholasTurner23
PR: airqo-platform/AirQo-api#3444
File: src/workflows/airqo_etl_utils/bigquery_api.py:168-168
Timestamp: 2024-09-16T19:16:23.374Z
Learning: In the `get_columns` method of `bigquery_api.py`, the default value for `column_type` must be a list due to implementation requirements.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @NicholasTurner23
Description
This PR reduces the resource utilization on the server by pushing most of the load to bigquery which is optimized for data querying and light aggregation.
This should solve the
Negsignal.SIGKILL
where theos
was killing thecelery
workers due to resource exhaustion.Related Issues
Summary by CodeRabbit
New Features
dynamic_query
parameter for more flexible querying.Bug Fixes
Refactor
Chores