Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update/integration iqair devices #3981

Conversation

NicholasTurner23
Copy link
Contributor

@NicholasTurner23 NicholasTurner23 commented Dec 3, 2024

Description

Updates configurations for bam and low cost gas sensors.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced data extraction and processing for device data.
    • New mappings for environmental data fields introduced.
    • Added a new field network in the BAM raw measurements schema.
    • Introduced a function to update the latest data topic in the realtime measurements DAG.
  • Bug Fixes

    • Improved error handling for device fetching and data processing.
  • Documentation

    • Updated JSON schema formatting for better readability.

These changes aim to improve data accuracy, enhance configurability, and streamline data processing workflows for users.

Copy link
Contributor

coderabbitai bot commented Dec 3, 2024

📝 Walkthrough

Walkthrough

The pull request introduces significant modifications across several files related to data extraction, processing, and validation within the AirQo ETL utilities. Key updates include enhancements to the AirQoDataUtils class for improved data handling and error logging, the introduction of new mappings in the Config class, and refinements in the DataValidationUtils class for better data type formatting. Additionally, changes to the JSON schema for bam_raw_measurements and updates to Airflow DAGs reflect a shift in how data is processed and managed, particularly concerning device data.

Changes

File Path Change Summary
src/workflows/airqo_etl_utils/airqo_utils.py Updated AirQoDataUtils class methods for data extraction and error logging; refactored data handling logic.
src/workflows/airqo_etl_utils/config.py Added AIRQO_BAM_MAPPING_NEW and AIRQO_LOW_COST_GAS_FIELD_MAPPING for improved data mapping.
src/workflows/airqo_etl_utils/data_validator.py Enhanced DataValidationUtils methods for better data formatting and handling of specific columns.
src/workflows/airqo_etl_utils/schema/bam_raw_measurements.json Added network field and updated formatting for existing fields in the JSON schema.
src/workflows/dags/airqo_bam_measurements.py Modified DAGs for date-time retrieval and added update_latest_data_topic function in realtime measurements.

Possibly related PRs

Suggested reviewers

  • Baalmart
  • BenjaminSsempala
  • Psalmz777

🎉 In the realm of data, changes unfold,
With mappings anew and stories retold.
From devices to schemas, the flow's refined,
Error logs shining, clarity aligned.
In the dance of the DAGs, new functions arise,
Celebrating progress, we reach for the skies! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
src/workflows/airqo_etl_utils/airqo_utils.py (1)

228-243: Consider optimizing the data mapping logic.

The code handles nested data structures well, but there are a few opportunities for improvement:

  1. The None default in data_mapping.get(key, None) is redundant as .get() returns None by default.
  2. The nested conditionals could be simplified using early returns.

Consider this refactoring:

-            if isinstance(entry, dict):
-                for key, value_data in entry.items():
-                    target_key = data_mapping.get(key, None)
-                    target_value = None
-                    if isinstance(target_key, dict):
-                        target_value = target_key.get("value")
-                        target_key = target_key.get("key")
-
-                    if target_key and target_key not in row_data:
-                        if isinstance(value_data, dict):
-                            extracted_value = AirQoDataUtils._extract_nested_value(
-                                value_data, target_value
-                            )
-                        else:
-                            extracted_value = value_data
-                        row_data[target_key] = extracted_value
+            if not isinstance(entry, dict):
+                return row_data
+            
+            for key, value_data in entry.items():
+                target_key = data_mapping.get(key)
+                if not target_key or target_key in row_data:
+                    continue
+                
+                if isinstance(target_key, dict):
+                    target_value = target_key.get("value")
+                    target_key = target_key.get("key")
+                else:
+                    target_value = None
+                
+                extracted_value = (
+                    AirQoDataUtils._extract_nested_value(value_data, target_value)
+                    if isinstance(value_data, dict)
+                    else value_data
+                )
+                row_data[target_key] = extracted_value
🧰 Tools
🪛 Ruff (0.8.0)

230-230: Use data_mapping.get(key) instead of data_mapping.get(key, None)

Replace data_mapping.get(key, None) with data_mapping.get(key)

(SIM910)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between af1a032 and a9e3bbd.

📒 Files selected for processing (5)
  • src/workflows/airqo_etl_utils/airqo_utils.py (6 hunks)
  • src/workflows/airqo_etl_utils/config.py (4 hunks)
  • src/workflows/airqo_etl_utils/data_validator.py (2 hunks)
  • src/workflows/airqo_etl_utils/schema/bam_raw_measurements.json (5 hunks)
  • src/workflows/dags/airqo_bam_measurements.py (0 hunks)
💤 Files with no reviewable changes (1)
  • src/workflows/dags/airqo_bam_measurements.py
🧰 Additional context used
🪛 Ruff (0.8.0)
src/workflows/airqo_etl_utils/airqo_utils.py

230-230: Use data_mapping.get(key) instead of data_mapping.get(key, None)

Replace data_mapping.get(key, None) with data_mapping.get(key)

(SIM910)


437-437: Local variable mapping is assigned to but never used

Remove assignment to unused variable mapping

(F841)

🔇 Additional comments (8)
src/workflows/airqo_etl_utils/schema/bam_raw_measurements.json (1)

7-11: LGTM: Network field addition is well-structured

The new "network" field is properly defined with consistent type and mode, maintaining schema consistency.

src/workflows/airqo_etl_utils/data_validator.py (2)

75-77: LGTM: Improved timestamp formatting with milliseconds handling

The regex pattern properly handles timestamps ending with 'Z' by adding milliseconds if missing, ensuring consistent datetime formatting.


213-213: LGTM: Streamlined device data column selection

The column selection is appropriately focused on essential device metadata fields.

src/workflows/airqo_etl_utils/config.py (3)

171-187: LGTM: Well-structured BAM sensor mapping

The new BAM mapping configuration properly defines all essential sensor parameters with a clear structure.


221-243: LGTM: Comprehensive gas sensor field mapping

The mapping includes all necessary gas sensor parameters with clear documentation for special fields like velocity and satellites.


317-325: LGTM: Consistent device configuration mapping

The device configuration properly references the new mapping structures while maintaining consistency across device types.

src/workflows/airqo_etl_utils/airqo_utils.py (2)

568-570: LGTM: Improved error logging.

The enhanced error logging provides clear context about device fetching failures, which will help with debugging and monitoring.


Line range hint 437-452: Remove unused variable and verify field mapping.

The variable mapping is assigned but never used, which could indicate:

  1. An incomplete refactoring where the old mapping was replaced with AIRQO_LOW_COST_GAS_FIELD_MAPPING
  2. A potential bug where the wrong mapping is being used

Please either:

  1. Remove the unused variable:
-            mapping = configuration.AIRQO_LOW_COST_GAS_FIELD_MAPPING
  1. Or verify if this mapping should be used instead of the field columns configuration.

@Baalmart Baalmart merged commit 536a74c into airqo-platform:staging Dec 3, 2024
46 checks passed
@Baalmart Baalmart mentioned this pull request Dec 3, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants