Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fix/analytics data export cleanup #3819

Conversation

NicholasTurner23
Copy link
Contributor

@NicholasTurner23 NicholasTurner23 commented Nov 8, 2024

Description

This PR improves error handling and prepares to update the query builder.

Related Issues

  • JIRA cards:
    • OPS-309

Summary by CodeRabbit

  • New Features

    • Enhanced filtering mechanism for data retrieval, allowing users to specify filter types and values.
    • Improved error handling and validation for API requests, providing clearer error messages for invalid inputs.
  • Bug Fixes

    • Streamlined error propagation in filtering functions, ensuring errors are raised with specific messages rather than being silently ignored.
  • Chores

    • Added a new dependency for Google Cloud BigQuery Storage to improve data handling capabilities.

Copy link
Contributor

coderabbitai bot commented Nov 8, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces significant modifications to the download_from_bigquery and data_export_query methods in the EventsModel class, consolidating parameters into a more flexible filtering mechanism. The filter_type and filter_value parameters replace multiple previous parameters, enhancing control flow and SQL query construction. Additionally, error handling in the filter_non_private_sites and filter_non_private_devices functions is refined to raise specific errors instead of logging them silently. Lastly, the DataExportResource class is updated with improved logging and validation processes for API requests.

Changes

File Path Change Summary
src/analytics/api/models/events.py Updated download_from_bigquery and data_export_query method signatures to use filter_type and filter_value instead of devices, sites, and airqlouds. Control flow and SQL query logic adjusted accordingly.
src/analytics/api/utils/data_formatters.py Modified filter_non_private_sites and filter_non_private_devices to raise RuntimeError with specific messages instead of logging and returning empty dictionaries.
src/analytics/api/views/data.py Enhanced DataExportResource class with a new _get_validated_filter method for improved filter validation. Updated error handling in post method to raise ValueError for invalid options. Modified _get_valid_option method signature.
src/analytics/requirements.txt Added new dependency: google-cloud-bigquery-storage==2.27.0.

Possibly related PRs

  • Update fix/analytics data export cleanup #3816: This PR modifies the data_export_query method in the same events.py file, which is directly related to the changes made in the main PR regarding the data_export_query method's parameters and logic.

Suggested labels

ready for review

Suggested reviewers

  • Baalmart
  • BenjaminSsempala
  • Mnoble-19
  • Psalmz777

🎉 In the realm of data, changes unfold,
Parameters refined, a story retold.
With filters now clear, and errors in sight,
The code dances smoothly, a beautiful sight.
From events to exports, the flow's now refined,
In the world of analytics, progress we find! 📊✨

Warning

Tool Failures:

Tool Failure Count:


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b501ae2 and bee1c20.

📒 Files selected for processing (1)
  • src/analytics/api/models/events.py (9 hunks)
🧰 Additional context used
📓 Learnings (1)
src/analytics/api/models/events.py (1)
Learnt from: NicholasTurner23
PR: airqo-platform/AirQo-api#3819
File: src/analytics/api/models/events.py:47-48
Timestamp: 2024-11-08T14:25:36.133Z
Learning: Certain modules have a different version of `download_from_bigquery`, and updates to align their parameters will happen at a later time.
🔇 Additional comments (5)
src/analytics/api/models/events.py (5)

47-48: LGTM: Well-structured parameter consolidation.

The consolidation of multiple filter parameters into filter_type and filter_value improves the API design. The parameters are well-documented with clear descriptions.


72-73: LGTM: Improved error handling with specific exception.

The change from generic Exception to specific ValueError with a clear message improves error handling and debugging.


65-70: LGTM: Clean table mapping implementation.

The dictionary-based table mapping improves code organization and makes frequency-based table selection more maintainable.


253-257: LGTM: Secure query parameter handling.

The use of QueryJobConfig with ArrayQueryParameter for filter_value properly addresses SQL injection concerns raised in previous reviews.


Line range hint 119-132: Verify consistent use of parameterized queries.

The implementation consistently uses parameterized queries for filter values across all query paths, which is secure against SQL injection.

Also applies to: 173-187, 200-208


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)
src/analytics/api/utils/data_formatters.py (2)

323-323: Enhance exception chaining for better debugging.

I appreciate the improvement in error propagation. To make debugging even easier, let's maintain the exception chain using Python's exception chaining syntax.

-        raise RuntimeError(f"Error while filtering non private sites {rex}")
+        raise RuntimeError(f"Error while filtering non private sites {rex}") from rex

Additionally, consider adding a return type hint to the function signature:

def filter_non_private_sites(sites: List[str]) -> Dict[str, Union[List[str], Any]]:
🧰 Tools
🪛 Ruff

323-323: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


351-351: Enhance exception handling and documentation.

Great improvement in error handling! Let's polish it further with these enhancements:

  1. Maintain exception chain:
-        raise RuntimeError(f"Error while filtering non private devices {rex}")
+        raise RuntimeError(f"Error while filtering non private devices {rex}") from rex
  1. Fix docstring typo and parameter description:
     """
-    FilterS out private device IDs from a provided array of device IDs.
+    Filters out private device IDs from a provided array of device IDs.

     Args:
-        entities(List[str]): List of device/site ids to filter against.
+        devices(List[str]): List of device IDs to filter against.
  1. Add return type hint:
def filter_non_private_devices(devices: List[str]) -> Dict[str, Union[List[str], Any]]:
🧰 Tools
🪛 Ruff

351-351: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

src/analytics/api/views/data.py (1)

115-117: Remove unused variable 'data_type'

The variable data_type is assigned but never used. Consider removing it to clean up the code.

Apply this diff to remove the unused variable:

-        data_type = self._get_valid_option(
-            json_data.get("datatype"), valid_options["data_types"], "datatype"
-        )
🧰 Tools
🪛 Ruff

115-115: Local variable data_type is assigned to but never used

Remove assignment to unused variable data_type

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4425222 and b501ae2.

📒 Files selected for processing (4)
  • src/analytics/api/models/events.py (8 hunks)
  • src/analytics/api/utils/data_formatters.py (2 hunks)
  • src/analytics/api/views/data.py (5 hunks)
  • src/analytics/requirements.txt (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/analytics/requirements.txt
🧰 Additional context used
🪛 Ruff
src/analytics/api/utils/data_formatters.py

323-323: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


351-351: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

src/analytics/api/views/data.py

115-115: Local variable data_type is assigned to but never used

Remove assignment to unused variable data_type

(F841)

🔇 Additional comments (9)
src/analytics/api/views/data.py (6)

3-3: Proper initialization of logging

The addition of the logging import and logger initialization sets up logging correctly for this module.

Also applies to: 37-37


93-97: Enhanced error handling for filter validation

The new try-except block correctly handles ValueError exceptions from _get_validated_filter, returning an appropriate HTTP 400 response.


101-120: Improved error handling for option validation

The try-except block ensures that invalid options are gracefully handled, providing clear error messages to the user.

🧰 Tools
🪛 Ruff

115-115: Local variable data_type is assigned to but never used

Remove assignment to unused variable data_type

(F841)


124-126: Verify default value for 'minimum_output'

The minimum_output parameter defaults to True when not provided. Please confirm that this default aligns with the intended behavior and that downstream processes handle this value appropriately.


141-142: Updated data retrieval with consolidated filters

The call to download_from_bigquery now correctly uses filter_type and filter_value, aligning with the updated filtering mechanism in EventsModel.


196-235: Effective validation of filter parameters

The _get_validated_filter method ensures that exactly one of 'sites', 'devices', or 'airqlouds' is provided, enhancing input validation and preventing ambiguous requests.

src/analytics/api/models/events.py (3)

65-71: Efficient mapping of data tables based on frequency

The use of a dictionary to map frequencies to their corresponding data tables enhances code readability and maintainability. This dynamic selection simplifies future additions or modifications to frequency options.


72-73: Improved exception handling for invalid frequency

Raising a ValueError when an invalid frequency is provided offers clearer feedback and aids in debugging. This is a good practice for robust error handling.


200-201: Clarify the sorting columns when filtering by airqlouds

Including airqloud_id in sorting_cols when filter_type is 'airqlouds' ensures the data is sorted correctly. This enhances data consistency and readability in the final output.

Comment on lines +222 to +233
if filter_type == "sites":
validated_value = filter_non_private_sites(sites=filter_value).get(
"sites", []
)
elif filter_type == "devices":
validated_value = filter_non_private_devices(devices=filter_value).get(
"devices", []
)
else:
# No additional processing is needed for 'airqlouds'
validated_value = filter_value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle cases where filters result in empty lists

After filtering non-private sites or devices, validated_value may be empty. Consider adding checks to raise a ValueError if no valid items are found, providing clear feedback to the user.

Apply this diff to add empty checks:

        if filter_type == "sites":
            validated_value = filter_non_private_sites(sites=filter_value).get(
                "sites", []
            )
+           if not validated_value:
+               raise ValueError("No valid sites found after filtering.")
        elif filter_type == "devices":
            validated_value = filter_non_private_devices(devices=filter_value).get(
                "devices", []
            )
+           if not validated_value:
+               raise ValueError("No valid devices found after filtering.")
        else:
            # No additional processing is needed for 'airqlouds'
            validated_value = filter_value
+           if not validated_value:
+               raise ValueError("No valid airqlouds provided.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if filter_type == "sites":
validated_value = filter_non_private_sites(sites=filter_value).get(
"sites", []
)
elif filter_type == "devices":
validated_value = filter_non_private_devices(devices=filter_value).get(
"devices", []
)
else:
# No additional processing is needed for 'airqlouds'
validated_value = filter_value
if filter_type == "sites":
validated_value = filter_non_private_sites(sites=filter_value).get(
"sites", []
)
if not validated_value:
raise ValueError("No valid sites found after filtering.")
elif filter_type == "devices":
validated_value = filter_non_private_devices(devices=filter_value).get(
"devices", []
)
if not validated_value:
raise ValueError("No valid devices found after filtering.")
else:
# No additional processing is needed for 'airqlouds'
validated_value = filter_value
if not validated_value:
raise ValueError("No valid airqlouds provided.")

Comment on lines +236 to +256
def _get_valid_option(self, option, valid_options, option_name):
"""
Returns a validated option, defaulting to the first valid option if not provided or invalid.
Returns a validated option, raising an error with valid options if invalid.

Args:
option (str): Option provided in the request.
valid_options (list): List of valid options.
option_name (str): The name of the option being validated.

Returns:
str: A validated option from the list.

Raises:
ValueError: If the provided option is invalid.
"""
return (
option.lower()
if option and option.lower() in valid_options
else valid_options[0]
)
if option and option.lower() in valid_options:
return option.lower()
if option:
raise ValueError(
f"Invalid {option_name}. Valid values are: {', '.join(valid_options)}."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Assign default values when options are missing

The _get_valid_option method returns None if an option is not provided, which may lead to issues if downstream code does not handle None values. Consider assigning default values from valid_options to ensure consistent behavior.

Modify the method to assign default values:

    def _get_valid_option(self, option, valid_options, option_name):
        """
        Returns a validated option, raising an error with valid options if invalid.

        Args:
            option (str): Option provided in the request.
            valid_options (list): List of valid options.
            option_name (str): The name of the option being validated.

        Returns:
            str: A validated option from the list.

        Raises:
            ValueError: If the provided option is invalid.
        """
+       if not option:
+           return valid_options[0]
        elif option.lower() in valid_options:
            return option.lower()
        else:
            raise ValueError(
                f"Invalid {option_name}. Valid values are: {', '.join(valid_options)}."
            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_valid_option(self, option, valid_options, option_name):
"""
Returns a validated option, defaulting to the first valid option if not provided or invalid.
Returns a validated option, raising an error with valid options if invalid.
Args:
option (str): Option provided in the request.
valid_options (list): List of valid options.
option_name (str): The name of the option being validated.
Returns:
str: A validated option from the list.
Raises:
ValueError: If the provided option is invalid.
"""
return (
option.lower()
if option and option.lower() in valid_options
else valid_options[0]
)
if option and option.lower() in valid_options:
return option.lower()
if option:
raise ValueError(
f"Invalid {option_name}. Valid values are: {', '.join(valid_options)}."
)
def _get_valid_option(self, option, valid_options, option_name):
"""
Returns a validated option, raising an error with valid options if invalid.
Args:
option (str): Option provided in the request.
valid_options (list): List of valid options.
option_name (str): The name of the option being validated.
Returns:
str: A validated option from the list.
Raises:
ValueError: If the provided option is invalid.
"""
if not option:
return valid_options[0]
elif option.lower() in valid_options:
return option.lower()
else:
raise ValueError(
f"Invalid {option_name}. Valid values are: {', '.join(valid_options)}."
)

src/analytics/api/models/events.py Outdated Show resolved Hide resolved
src/analytics/api/models/events.py Show resolved Hide resolved
@Baalmart Baalmart merged commit f34a892 into airqo-platform:staging Nov 8, 2024
43 of 44 checks passed
@Baalmart Baalmart mentioned this pull request Nov 8, 2024
2 tasks
This was referenced Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants