Update fix/analytics data export cleanup #3819

NicholasTurner23 · 2024-11-08T14:07:26Z

Description

This PR improves error handling and prepares to update the query builder.

Related Issues

JIRA cards:
- OPS-309

Summary by CodeRabbit

New Features
- Enhanced filtering mechanism for data retrieval, allowing users to specify filter types and values.
- Improved error handling and validation for API requests, providing clearer error messages for invalid inputs.
Bug Fixes
- Streamlined error propagation in filtering functions, ensuring errors are raised with specific messages rather than being silently ignored.
Chores
- Added a new dependency for Google Cloud BigQuery Storage to improve data handling capabilities.

coderabbitai · 2024-11-08T14:07:34Z

📝 Walkthrough

Walkthrough

The pull request introduces significant modifications to the download_from_bigquery and data_export_query methods in the EventsModel class, consolidating parameters into a more flexible filtering mechanism. The filter_type and filter_value parameters replace multiple previous parameters, enhancing control flow and SQL query construction. Additionally, error handling in the filter_non_private_sites and filter_non_private_devices functions is refined to raise specific errors instead of logging them silently. Lastly, the DataExportResource class is updated with improved logging and validation processes for API requests.

Changes

File Path	Change Summary
`src/analytics/api/models/events.py`	Updated `download_from_bigquery` and `data_export_query` method signatures to use `filter_type` and `filter_value` instead of `devices`, `sites`, and `airqlouds`. Control flow and SQL query logic adjusted accordingly.
`src/analytics/api/utils/data_formatters.py`	Modified `filter_non_private_sites` and `filter_non_private_devices` to raise `RuntimeError` with specific messages instead of logging and returning empty dictionaries.
`src/analytics/api/views/data.py`	Enhanced `DataExportResource` class with a new `_get_validated_filter` method for improved filter validation. Updated error handling in `post` method to raise `ValueError` for invalid options. Modified `_get_valid_option` method signature.
`src/analytics/requirements.txt`	Added new dependency: `google-cloud-bigquery-storage==2.27.0`.

Possibly related PRs

Update fix/analytics data export cleanup #3816: This PR modifies the data_export_query method in the same events.py file, which is directly related to the changes made in the main PR regarding the data_export_query method's parameters and logic.

Suggested labels

ready for review

Suggested reviewers

Baalmart
BenjaminSsempala
Mnoble-19
Psalmz777

🎉 In the realm of data, changes unfold,
Parameters refined, a story retold.
With filters now clear, and errors in sight,
The code dances smoothly, a beautiful sight.
From events to exports, the flow's now refined,
In the world of analytics, progress we find! 📊✨

Warning

Tool Failures:

Tool Failure Count:

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b501ae2 and bee1c20.

📒 Files selected for processing (1)

src/analytics/api/models/events.py (9 hunks)

🧰 Additional context used

📓 Learnings (1)

src/analytics/api/models/events.py (1)

Learnt from: NicholasTurner23
PR: airqo-platform/AirQo-api#3819
File: src/analytics/api/models/events.py:47-48
Timestamp: 2024-11-08T14:25:36.133Z
Learning: Certain modules have a different version of `download_from_bigquery`, and updates to align their parameters will happen at a later time.

🔇 Additional comments (5)

src/analytics/api/models/events.py (5)

47-48: LGTM: Well-structured parameter consolidation.

The consolidation of multiple filter parameters into filter_type and filter_value improves the API design. The parameters are well-documented with clear descriptions.

72-73: LGTM: Improved error handling with specific exception.

The change from generic Exception to specific ValueError with a clear message improves error handling and debugging.

65-70: LGTM: Clean table mapping implementation.

The dictionary-based table mapping improves code organization and makes frequency-based table selection more maintainable.

253-257: LGTM: Secure query parameter handling.

The use of QueryJobConfig with ArrayQueryParameter for filter_value properly addresses SQL injection concerns raised in previous reviews.

Line range hint 119-132: Verify consistent use of parameterized queries.

The implementation consistently uses parameterized queries for filter values across all query paths, which is secure against SQL injection.

Also applies to: 173-187, 200-208

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)

src/analytics/api/utils/data_formatters.py (2)
323-323: Enhance exception chaining for better debugging.

I appreciate the improvement in error propagation. To make debugging even easier, let's maintain the exception chain using Python's exception chaining syntax.
-        raise RuntimeError(f"Error while filtering non private sites {rex}")
+        raise RuntimeError(f"Error while filtering non private sites {rex}") from rex
Additionally, consider adding a return type hint to the function signature:
def filter_non_private_sites(sites: List[str]) -> Dict[str, Union[List[str], Any]]:
🧰 Tools

🪛 Ruff

323-323: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

351-351: Enhance exception handling and documentation.

Great improvement in error handling! Let's polish it further with these enhancements:

Maintain exception chain:
-        raise RuntimeError(f"Error while filtering non private devices {rex}")
+        raise RuntimeError(f"Error while filtering non private devices {rex}") from rex
Fix docstring typo and parameter description:
     """
-    FilterS out private device IDs from a provided array of device IDs.
+    Filters out private device IDs from a provided array of device IDs.

     Args:
-        entities(List[str]): List of device/site ids to filter against.
+        devices(List[str]): List of device IDs to filter against.
Add return type hint:
def filter_non_private_devices(devices: List[str]) -> Dict[str, Union[List[str], Any]]:
🧰 Tools

🪛 Ruff

351-351: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)
src/analytics/api/views/data.py (1)
115-117: Remove unused variable 'data_type'

The variable data_type is assigned but never used. Consider removing it to clean up the code.

Apply this diff to remove the unused variable:
-        data_type = self._get_valid_option(
-            json_data.get("datatype"), valid_options["data_types"], "datatype"
-        )
🧰 Tools

🪛 Ruff

115-115: Local variable data_type is assigned to but never used

Remove assignment to unused variable data_type

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4425222 and b501ae2.

📒 Files selected for processing (4)

src/analytics/api/models/events.py (8 hunks)
src/analytics/api/utils/data_formatters.py (2 hunks)
src/analytics/api/views/data.py (5 hunks)
src/analytics/requirements.txt (1 hunks)

✅ Files skipped from review due to trivial changes (1)

src/analytics/requirements.txt

🧰 Additional context used

🪛 Ruff

src/analytics/api/utils/data_formatters.py

323-323: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

351-351: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

src/analytics/api/views/data.py

115-115: Local variable data_type is assigned to but never used

Remove assignment to unused variable data_type

(F841)

🔇 Additional comments (9)

src/analytics/api/views/data.py (6)

3-3: Proper initialization of logging

The addition of the logging import and logger initialization sets up logging correctly for this module.

Also applies to: 37-37

93-97: Enhanced error handling for filter validation

The new try-except block correctly handles ValueError exceptions from _get_validated_filter, returning an appropriate HTTP 400 response.

101-120: Improved error handling for option validation

The try-except block ensures that invalid options are gracefully handled, providing clear error messages to the user.

🧰 Tools

🪛 Ruff

115-115: Local variable data_type is assigned to but never used

Remove assignment to unused variable data_type

(F841)

124-126: Verify default value for 'minimum_output'

The minimum_output parameter defaults to True when not provided. Please confirm that this default aligns with the intended behavior and that downstream processes handle this value appropriately.

141-142: Updated data retrieval with consolidated filters

The call to download_from_bigquery now correctly uses filter_type and filter_value, aligning with the updated filtering mechanism in EventsModel.

196-235: Effective validation of filter parameters

The _get_validated_filter method ensures that exactly one of 'sites', 'devices', or 'airqlouds' is provided, enhancing input validation and preventing ambiguous requests.

src/analytics/api/models/events.py (3)

65-71: Efficient mapping of data tables based on frequency

The use of a dictionary to map frequencies to their corresponding data tables enhances code readability and maintainability. This dynamic selection simplifies future additions or modifications to frequency options.

72-73: Improved exception handling for invalid frequency

Raising a ValueError when an invalid frequency is provided offers clearer feedback and aids in debugging. This is a good practice for robust error handling.

200-201: Clarify the sorting columns when filtering by airqlouds

Including airqloud_id in sorting_cols when filter_type is 'airqlouds' ensures the data is sorted correctly. This enhances data consistency and readability in the final output.

coderabbitai · 2024-11-08T14:15:23Z

src/analytics/api/views/data.py

+        if filter_type == "sites":
+            validated_value = filter_non_private_sites(sites=filter_value).get(
+                "sites", []
+            )
+        elif filter_type == "devices":
+            validated_value = filter_non_private_devices(devices=filter_value).get(
+                "devices", []
+            )
+        else:
+            # No additional processing is needed for 'airqlouds'
+            validated_value = filter_value
+


🛠️ Refactor suggestion

Handle cases where filters result in empty lists

After filtering non-private sites or devices, validated_value may be empty. Consider adding checks to raise a ValueError if no valid items are found, providing clear feedback to the user.

Apply this diff to add empty checks:

if filter_type == "sites": validated_value = filter_non_private_sites(sites=filter_value).get( "sites", [] ) + if not validated_value: + raise ValueError("No valid sites found after filtering.") elif filter_type == "devices": validated_value = filter_non_private_devices(devices=filter_value).get( "devices", [] ) + if not validated_value: + raise ValueError("No valid devices found after filtering.") else: # No additional processing is needed for 'airqlouds' validated_value = filter_value + if not validated_value: + raise ValueError("No valid airqlouds provided.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if filter_type == "sites":

validated_value = filter_non_private_sites(sites=filter_value).get(

"sites", []

)

elif filter_type == "devices":

validated_value = filter_non_private_devices(devices=filter_value).get(

"devices", []

)

else:

# No additional processing is needed for 'airqlouds'

validated_value = filter_value

if filter_type == "sites":

validated_value = filter_non_private_sites(sites=filter_value).get(

"sites", []

)

if not validated_value:

raise ValueError("No valid sites found after filtering.")

elif filter_type == "devices":

validated_value = filter_non_private_devices(devices=filter_value).get(

"devices", []

)

if not validated_value:

raise ValueError("No valid devices found after filtering.")

else:

# No additional processing is needed for 'airqlouds'

validated_value = filter_value

if not validated_value:

raise ValueError("No valid airqlouds provided.")

coderabbitai · 2024-11-08T14:15:23Z

src/analytics/api/views/data.py

+    def _get_valid_option(self, option, valid_options, option_name):
        """
-        Returns a validated option, defaulting to the first valid option if not provided or invalid.
+        Returns a validated option, raising an error with valid options if invalid.

        Args:
            option (str): Option provided in the request.
            valid_options (list): List of valid options.
+            option_name (str): The name of the option being validated.

        Returns:
            str: A validated option from the list.
+
+        Raises:
+            ValueError: If the provided option is invalid.
        """
-        return (
-            option.lower()
-            if option and option.lower() in valid_options
-            else valid_options[0]
-        )
+        if option and option.lower() in valid_options:
+            return option.lower()
+        if option:
+            raise ValueError(
+                f"Invalid {option_name}. Valid values are: {', '.join(valid_options)}."
+            )


🛠️ Refactor suggestion

Assign default values when options are missing

The _get_valid_option method returns None if an option is not provided, which may lead to issues if downstream code does not handle None values. Consider assigning default values from valid_options to ensure consistent behavior.

Modify the method to assign default values:

def _get_valid_option(self, option, valid_options, option_name): """ Returns a validated option, raising an error with valid options if invalid. Args: option (str): Option provided in the request. valid_options (list): List of valid options. option_name (str): The name of the option being validated. Returns: str: A validated option from the list. Raises: ValueError: If the provided option is invalid. """ + if not option: + return valid_options[0] elif option.lower() in valid_options: return option.lower() else: raise ValueError( f"Invalid {option_name}. Valid values are: {', '.join(valid_options)}." )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _get_valid_option(self, option, valid_options, option_name):

"""

Returns a validated option, defaulting to the first valid option if not provided or invalid.

Returns a validated option, raising an error with valid options if invalid.

Args:

option (str): Option provided in the request.

valid_options (list): List of valid options.

option_name (str): The name of the option being validated.

Returns:

str: A validated option from the list.

Raises:

ValueError: If the provided option is invalid.

"""

return (

option.lower()

if option and option.lower() in valid_options

else valid_options[0]

)

if option and option.lower() in valid_options:

return option.lower()

if option:

raise ValueError(

f"Invalid {option_name}. Valid values are: {', '.join(valid_options)}."

)

def _get_valid_option(self, option, valid_options, option_name):

"""

Returns a validated option, raising an error with valid options if invalid.

Args:

option (str): Option provided in the request.

valid_options (list): List of valid options.

option_name (str): The name of the option being validated.

Returns:

str: A validated option from the list.

Raises:

ValueError: If the provided option is invalid.

"""

if not option:

return valid_options[0]

elif option.lower() in valid_options:

return option.lower()

else:

raise ValueError(

f"Invalid {option_name}. Valid values are: {', '.join(valid_options)}."

)

src/analytics/api/models/events.py

NicholasTurner23 added 3 commits November 8, 2024 16:11

Clean up and error handling

4d787a1

WIP to easy readability and usability

2886f84

Updates from airqo staging

b501ae2

NicholasTurner23 requested review from Baalmart, BenjaminSsempala, OchiengPaul442 and Mnoble-19 November 8, 2024 14:07

coderabbitai bot reviewed Nov 8, 2024

View reviewed changes

Parameterise filter values for security

bee1c20

Baalmart merged commit f34a892 into airqo-platform:staging Nov 8, 2024
43 of 44 checks passed

Baalmart mentioned this pull request Nov 8, 2024

move to production #3818

Merged

2 tasks

coderabbitai bot mentioned this pull request Nov 11, 2024

Update fix/analytics data export cleanup #3831

Merged

This was referenced Nov 20, 2024

Update fix/analytics data export cleanup #3867

Merged

Update fix/analytics data export cleanup #3871

Merged

Update fix/analytics data export cleanup #3927

Merged

This was referenced Dec 4, 2024

Update fix/analytics data export cleanup #3996

Merged

Update fix/analytics data export cleanup #4037

Merged

This was referenced Dec 12, 2024

Update fix/clean up #4058

Merged

Update fix/clean up #4062

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update fix/analytics data export cleanup #3819

Update fix/analytics data export cleanup #3819

NicholasTurner23 commented Nov 8, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 8, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Tool Failures:

Tool Failure Count:

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 8, 2024

coderabbitai bot Nov 8, 2024

Update fix/analytics data export cleanup #3819

Update fix/analytics data export cleanup #3819

Conversation

NicholasTurner23 commented Nov 8, 2024 • edited by coderabbitai bot Loading

Description

Related Issues

Summary by CodeRabbit

coderabbitai bot commented Nov 8, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Tool Failures:

Tool Failure Count:

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 8, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 8, 2024

Choose a reason for hiding this comment

NicholasTurner23 commented Nov 8, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 8, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)